AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR APPLYING A COMPOSABLE ASSURANCE INTEGRITY FRAMEWORK

Info

Publication number: 20230004888
Type: Application
Filed: Jun 30, 2022
Publication Date: Jan 5, 2023
Applicant: PricewaterhouseCoopers LLP (New York, NY)
Inventors: Chung-Sheng LI (Scarsdale, NY), Winnie CHENG (West New York, NJ), Mark John FLAVELL (Madison, NJ), Lori Marie HALLMARK (Xenia, OH), Nancy Alayne LIZOTTE (Saline, MI), Kevin Ma LEONG (Randolph, NJ)
Application Number: 17/854,338

Abstract

A system for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence is provided. The system receives data representing a plurality of statements and data representing corroborating evidence. The system applies one or more integrity analysis models to the first data and the second data in order to generate an assessment of a risk that one or more of the plurality of statements represents a material misstatement. A system for generating an assessment of faithfulness of data is provided. The system compared data representing a statement to data representing corroborating evidence, and generates a similarity metric representing their similarity. Based on the similarity metric, the system generates an output representing an assessment of faithfulness of the first data set.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims the benefit of U.S. Provisional Application No. 63/217,119 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,123 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,127 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,131 filed Jun. 30, 2021; and U.S. Provisional Application No. 63/217,134, filed Jun. 30, 2021, the entire contents of each of which are incorporated herein by reference.

FIELD

This related generally to AI-augmented data processing, and more specifically to data processing systems and methods applying composable assurance integrity framework and a context-aware data integrity framework.

BACKGROUND

Performing audits manually is both time-consuming, expensive, prone to introducing human error, and prone to introducing human biases. Furthermore, due to the inherent limitations of manual auditing, sampling approaches are used instead of full-population testing. Sampling approaches attempt to select a representative sample, but there is no way to guarantee that important information is not missed in the data that is not selected for review.

Furthermore, according to known techniques for vouching and tracing, which may be done pursuant to an audit process, vouching and tracing are done independently as two separate processes and using audit sampling.

SUMMARY

As explained above, performing audits manually is both time-consuming, expensive, prone to introducing human error, and prone to introducing human biases. Furthermore, due to the inherent limitations of manual auditing, sampling approaches are used instead of full-population testing. Sampling approaches attempt to select a representative sample, but there is no way to guarantee that important information is not missed in the data that is not selected for review. Accordingly, attempts have been made to automate parts of the auditing process. However, introduction of technologies into audit approaches have mostly focused on substantive testing or control testing or risk assessment. Furthermore, in existing audit systems that have introduced one or more technologies, the uncertainties introduced by the technology itself have been largely ignored or, at most, imprecisely and inaccurately accounted for by attempted human review. Further still, existing audit systems that have introduced one or more technologies have focused on narrow approaches for a single, specific financial statement line item (FSLI). Thus, there is a lack of a consistent framework for addressing 20+ FSLIs. Narrow solutions for each FSLI audit are difficult to generalize and nearly impossible scale effectively and economically. Additionally, insights offered by existing systems for financial data do not distinguish between transactions that have been fully contextualized as compared to those which have not.

Furthermore, according to known techniques for vouching and tracing, which may be done pursuant to an audit process, vouching and tracing are done independently as two separate processes and using audit sampling. However, known systems and methods for information integrity do not handle fuzzy comparison, do not leverage of context of the evidence (e.g., master data, industry ontology, industry and client knowledge), do not leverage multiple evidence to establish data integrity, do not address the challenge that evidence might have been amended or updated, and do not address one-to-many/many-to-one/many-to-many relationships.

Accordingly, there is a need for improved systems and methods that address one or more of the above-identified shortcomings of known systems for automated auditing. Specifically, there is a need for an end-to-end transformation of audit approaches based on technologies. There is a need for systems and methods for AI-augmented auditing platforms providing a composable assurance integrity framework, the ability to accurately and automatically account for uncertainties introduced by technologies, the ability to apply in a generalized manner to multiple FSLIs, and the ability to distinguish between transactions that have been fully contextualized as compared to those which have not. The systems and methods described herein may meet one or more of these needs. Disclosed herein are systems and methods configured to review a plurality of FSLIs and to determine, based on evidence data reviewed, whether any of the FSLIs include a material misstatement. The system may address one or more of the above-identified needs by providing a composable framework that can be adapted to each of a plurality of FSLIs, allowing the system to be flexible and adaptable in addressing a broad spectrum of variations in terms of industry, business practices, and fast-changing business environments. The composable framework provided by the systems described herein may provide a consistent methodology for tracing activities within financial operations of a business and for determining potential materiality of misrepresentation of financial statements.

Furthermore, there is a need for improved systems and methods that address one or more of the above-identified shortcomings in known methods for vouching and tracing. Disclosed herein are methods and systems for performing automated (or semi-automated) data processing operations for auditing processes, wherein vouching and tracing (e.g., for FSLI audit for multiple documents and ERP records) are conducted semi-automatically or fully automatically at the same time, wherein the specification and the actual matching of the corresponding fields in the ledger and the supporting source documents are performed automatically.

In some embodiments, a first system is provided, the first system being for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the first system comprising one or more processors configured to cause the first system to: receive a first data set representing a plurality of statements; receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; apply one or more integrity analysis models to the first data set and the second data set in order to generate an assessment of a risk that one or more of the plurality of statements represents a material misstatement.

In some embodiments of the first system, applying the one or more integrity analysis models comprises applying one or more process integrity analysis models to trace one or more changes represented by the plurality of statements.

In some embodiments of the first system, applying the one or more integrity analysis models comprises applying one or more data integrity analysis models to generate an assessment of fidelity of information in one or more of the first data set and the second data set to a ground truth represented by the information.

In some embodiments of the first system, applying the one or more integrity analysis models comprises applying one or more policy integrity models to generate output data comprising an adjudication according to an assurance knowledge substrate, wherein the adjudication is based all or part of one or both of: the plurality of statements and the corroborating evidence.

In some embodiments of the first system, the assurance knowledge substrate includes data representing one or more of the following: industry practice of an industry related to one or more of the plurality of statements, historical behavior related to one or more parties relevant to one or more of the plurality of statements, one or more accounting policies, and one or more auditing standards.

In some embodiments of the first system, the assessment of a risk that one or more of the plurality of statements represents a material misstatement is associated with a level selected from: a transaction-level, an account level, and a line-item level.

In some embodiments of the first system, generating the assessment of a risk is based at least in part on an assessed level of risk attributable to one or more automated processes used in generating or processing one or both of the first and second data sets.

In some embodiments of the first system, generating the assessment of a risk comprises performing full-population testing on the first data set and the second data set.

In some embodiments of the first system, performing full-population testing comprises: applying one or more process integrity models based on ERP data included in one or both of the first data set and the second data set; and applying one or more data integrity models based on corroborating evidence in the second data set.

In some embodiments of the first system, the one or more processors are configured to apply the assessment of the risk in order to configure a characteristic of a target sampling process.

In some embodiments of the first system, the one or more processors are configured to apply one or more common modules across two of more models selected from: a data integrity model, a process integrity model, and a policy integrity model.

In some embodiments of the first system, the one or more processors are configured to apply an assurance insight model in order to generate, based at least in part on the generated assessment of risk of material misstatement, assurance insight data.

In some embodiments of the first system, the one or more processors are configured to apply an assurance recommendation model to generate, based at least in part on the assurance insight data, recommendation data.

In some embodiments, a first non-transitory computer-readable storage medium is provided, the first non-transitory computer-readable storage medium storing instructions for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive a first data set representing a plurality of statements; receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; apply one or more integrity analysis models to the first data set and the second data set in order to generate an assessment of a risk that one or more of the plurality of statements represents a material misstatement.

In some embodiments, a first method is provided, the first method being for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, wherein the first method is performed by a system comprising one or more processors, the first method comprising: receive a first data set representing a plurality of statements; receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; apply one or more integrity analysis models to the first data set and the second data set in order to generate an assessment of a risk that one or more of the plurality of statements represents a material misstatement.

In some embodiments, a second system is provided, the second system being for generating an assessment of faithfulness of data, the second system comprising one or more processors configured to cause the second system to: receive a first data set representing a plurality of statements; receive a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements; generate, for each of the plurality of statements, a respective statement feature vector; generate, for each of the plurality of items of corroborating evidence, a respective evidence feature vector; compute, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; generate, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.

In some embodiments of the second system, generating the output data representing the assessment of faithfulness comprises performing a clustering operation on a set of similarity metrics including the similarity metric.

In some embodiments of the second system, generating the respective statement feature vectors comprises encoding one or more of the following: content information included in the first data set, contextual information included in the first data set; and information received from a data source distinct from the first data set.

In some embodiments of the second system, generating the respective evidence feature vectors comprises encoding one or more of the following: content information included in the second data set, contextual information included in the second data set; and information received from a data source distinct from the second data set.

In some embodiments of the second system, the first data set is selected based on one or more data selection criteria for selecting a subset of available data within a system, wherein the subset selection criteria comprise one or more of the following: a data content criteria and a temporal criteria.

In some embodiments of the second system, the second data set comprises data representing provenance of one or more of the items of corroborating evidence.

In some embodiments of the second system, the second data set comprises one or more of the following: structured data, semi-structured data, and unstructured data.

In some embodiments of the second system, the second data set comprises data representing multiple versions of a single document.

In some embodiments of the second system, generating the similarity metric comprises comparing a single one of the statement feature vectors to a plurality of the evidence feature vectors.

In some embodiments of the second system, generating the similarity metric comprises applying dynamic programming.

In some embodiments of the second system, generating the similarity metric comprises applying one or more weights, wherein the weights are determined in accordance with one or more machine learning models.

In some embodiments of the second system, generating the output data representing the assessment of faithfulness comprises generating a confidence score.

In some embodiments of the second system, generating the output data representing the assessment of faithfulness comprises assessing sufficiency of faithfulness at a plurality of levels.

In some embodiments, a second non-transitory computer-readable storage medium is provided, the second non-transitory computer-readable storage medium storing instructions for generating an assessment of faithfulness of data, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive a first data set representing a plurality of statements; receive a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements; generate, for each of the plurality of statements, a respective statement feature vector; generate, for each of the plurality of items of corroborating evidence, a respective evidence feature vector; compute, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; generate, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.

In some embodiments, a second method is provided, the second method being for generating an assessment of faithfulness of data, wherein the second method is performed by a system comprising one or more processors, the second method comprising: receiving a first data set representing a plurality of statements; receiving a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements; generating, for each of the plurality of statements, a respective statement feature vector; generating, for each of the plurality of items of corroborating evidence, a respective evidence feature vector; computing, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; generating, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.

In some embodiments, a third system is provided, the third system being for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the third system comprising one or more processors configured to cause the system to: receive a first data set representing a plurality of statements; receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.

In some embodiments, a third non-transitory computer-readable storage medium is provided, the third non-transitory computer-readable storage medium storing instructions for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive a first data set representing a plurality of statements; receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.

In some embodiments, a third method is provided, the third method being for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, wherein the third method is performed by a system comprising one or more processors, the third method comprising: receiving a first data set representing a plurality of statements; receiving a second data set comprising a corroborating evidence related to one or more of the plurality of statements; applying one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.

In some embodiments, any one or more of the features, characteristics, or aspects of any one or more of the above systems, methods, or non-transitory computer-readable storage media may be combined, in whole or in part, with one another and/or with any one or more of the features, characteristics, or aspects (in whole or in part) of any other embodiment or disclosure herein.

BRIEF DESCRIPTION OF THE FIGURES

Various embodiments are described with reference to the accompanying figures, in which:

FIGS. 1A-1B show a system architecture diagram for a system for providing a composable integrity framework, in accordance with some embodiments.

FIGS. 2A-2B depicts a conceptual architecture for a system for providing a composable integrity framework, in accordance with some embodiments.

FIG. 3 depicts a diagram showing the probability of an overall assertion being true using a Bayesian belief network to trace uncertainty in reasoning, in accordance with some embodiments.

FIG. 4 depicts evidence reasoning for revenue and receivables using a Bayesian belief network, in accordance with some embodiments.

FIG. 5 illustrates an example of a computer, in accordance with some embodiments.

DETAILED DESCRIPTION

Described herein are systems and methods for providing AI-augmented auditing platforms, including providing a composable framework for assurance integrity that can be adapted to each of a plurality of FSLIs. Output data generated by the system may comprise an indication as to whether one or more FSLIs or other assertions analyzed by the system) comprises a material misstatement. Furthermore, systems and methods described herein systems and methods for semi-automated or fully-automated simultaneous vouching and tracing for data integrity. In some embodiments, any one or more of the data integrity techniques discussed herein may be used as part of a composable assurance integrity system. Systems and methods described herein may establish representation faithfulness for financial data that are usable to determine whether there are any material misstatements, e.g., in FSLIs.

Composable Assurance Integrity

Performing audits manually is both time-consuming, expensive, prone to introducing human error, and prone to introducing human biases. Furthermore, due to the inherent limitations of manual auditing, sampling approaches are used instead of full-population testing. Sampling approaches attempt to select a representative sample, but there is no way to guarantee that important information is not missed in the data that is not selected for review.

Accordingly, attempts have been made to automate parts of the auditing process. However, introduction of technologies into audit approaches have mostly focused on substantive testing or control testing or risk assessment.

Furthermore, in existing audit systems that have introduced one or more technologies, the uncertainties introduced by the technology itself have been largely ignored or, at most, imprecisely and inaccurately accounted for by attempted human review.

Further still, existing audit systems that have introduced one or more technologies have focused on narrow approaches for a single, specific financial statement line item (FSLI). Thus, there is a lack of a consistent framework for addressing 20+ FSLIs. Narrow solutions for each FSLI audit are difficult to generalize and nearly impossible scale effectively and economically.

Additionally, insights offered by existing systems for financial data do not distinguish between transactions that have been fully contextualized as compared to those which have not.

Accordingly, there is a need for improved systems and methods that address one or more of the above-identified shortcomings. Specifically, there is a need for an end-to-end transformation of audit approaches based on technologies. There is a need for systems and methods for AI-augmented auditing platforms providing a composable assurance integrity framework, the ability to accurately and automatically account for uncertainties introduced by technologies, the ability to apply in a generalized manner to multiple FSLIs, and the ability to distinguish between transactions that have been fully contextualized as compared to those which have not. The systems and methods described herein may meet one or more of these needs.

In some embodiments, a system for providing an AI-augmented auditing platform is provided. The system includes one or more processors configured to receive input data (e.g., documents, financial statements, other evidence) for an audit, received from one or more data sources. The system is configured to automatically apply one or more data processing operations to the received data to render one or more assessments, scores, and/or adjudications based on the received data. (Any data processing operation referenced herein may include application of one or more models trained by machine-learning.) The system may generate output data indicating one or more results of the data processing operations, and the results may be stored, visualized or otherwise provided to one or more users, and/or used to trigger one or more automated actions by the system. In some embodiments, the system may be configured to review a plurality of FSLIs and to determine, based on evidence data reviewed, whether any of the FSLIs include a material misstatement.

The system may address one or more of the above-identified needs by providing a composable framework that can be adapted to each of a plurality of FSLIs, allowing the system to be flexible and adaptable in addressing a broad spectrum of variations in terms of industry, business practices, and fast-changing business environments. The composable framework provided by the systems described herein may provide a consistent methodology for tracing activities within financial operations of a business and for determining potential materiality of misrepresentation of financial statements. As described herein, the system may be configured to begin an analysis with start with a chart of an account, and may trace activities that were recorded to their origins; this allows the system to render determinations as to whether there are any abnormalities in the data. Furthermore, the system may be applicable to both sampling-based testing and full-population testing, due at least to the adaptability and efficiency afforded by the composable framework.

In some embodiments, output data generated by the system may comprise an indication as to whether one or more FSLIs or other assertions analyzed by the system) comprises a material misstatement, as judged at least in part on the basis of one or both of (a) evidence data processed by the system and (b) uncertainties introduced by one or more technologies (e.g., OCR) applied by the system during the assessment process. In some embodiments, the output data may include a classification of the FSLI—e.g., “does include a misstatement,” “does not include a misstatement,” “does include a material misstatement,” or “does include a material misstatement”). In some embodiments, the output data may include a metric that quantifies or scores the system's assessment as to whether the FSLI includes a material misstatement. In some embodiments, a metric may score the extent of materiality of a misstatement. In some embodiments, a metric may score the system's confidence in the conclusion. In some embodiments, a metric may be based both on the determined level of materiality of a misstatement and on the system's confidence in the conclusion. In some embodiments, an individual output may be provided for each separate FSLI. In some embodiments, a combined or collective output may be provided for an transaction, account (e.g., including a plurality of transactions), or other overall audit scope as a whole.

In some embodiments, the output data may be stored, visualized or otherwise provided to one or more users, and/or used to trigger one or more automated actions by the system. In some embodiments, the output data may be used to assess a risk profile for a transaction population. In some embodiments, the output data may be used as a basis for target sampling (e.g., to automatically determine an extent of sampling and/or a manner in which sampling is carried out).

In some embodiments, the system may use a composable integrity framework to trace a plurality of transactions (or interactions or statements) end-to-end with corresponding evidentiary data received by the system in order to establish the risk of material misstatement for each transaction (or interaction or statement). In some embodiments, the system may apply one or more standards, thresholds, or criteria requirements to making one or more assessments, for example an assessment as to whether a transaction is successfully verified. In some embodiments, the system may be able to be configured in accordance with one or more user inputs (or other triggering events) in order to set or adjust a standard (e.g., an amount of evidence, a strength of evidence, a matching level, and/or a confidence level) required by the system in order to generate a certain output (e.g., an indication of successful verification).

In some embodiments, the system may apply one or more data processing operations and/or AI models to assess process integrity. Assessing process integrity may comprise tracing changes within an account, for example by using a chart of accounts, in order to trace changes to their source and in order to identify activities that are associated with said changes.

In some embodiments, the system may apply one or more data processing operations and/or AI models to assess data integrity. Assessing data integrity may comprise assessing the fidelity of information in a digital system with respect to the real world ground-truth that the data intends to represent.

In some embodiments, the system may apply one or more data processing operations and/or AI models to assess policy integrity. Assessing policy integrity may comprise adjudicating the evidence data collected in accordance with the process integrity and data integrity processes explained herein, wherein the adjudication is made in accordance with an assurance knowledge substrate. In some embodiments, the assurance knowledge substrate includes the following components: (a) information regarding context of a business, including industry practice, historical behavior, etc., determined using endogenous and/or exogenous information, and (b) one or more accounting policies (e.g., GAAP or IFRS) and/or auditing standards.

In some embodiments, the systems disclosed herein may leverage orchestration in order to enable the reuse and sharing of certain common modules across the process integrity, data integrity, and policy integrity processes for use with multiple different kinds of FSLIs.

In some embodiments, one or more of the three integrity assessments (process, data, and policy) may be applied with respect to a full population of available data (as opposed to selecting a limited, e.g., random, sample of available data for representative testing). In some embodiments, for data that is obtained from enterprise resource planning (ERP) systems or databases—ERP data—only the process integrity assessment may be applied (while data integrity and policy integrity may not be applied to said ERP data. In some embodiments, data integrity processing may be applied when evidence data can be obtained from one or more data sources, such as third-party data sources including banks, shipping carriers, etc.

In some embodiments, the system may be configured to apply a model including an assurance insight layer, wherein the assurance insight layer develops insights with respect to spatial, temporal, spatiotemporal, customer, product, and other attributes. The insights may be developed by this later at the population layer where the integrity has been analyzed for each transaction.

In some embodiments, the system may be configured to apply a model including an assurance recommendation layer, wherein the assurance recommendation layer generates recommendations, based on audit insight and based on data regarding one or more prior engagements, to be provided to one or more users of the system, for example an audit engagement team or audit client. In some embodiments, the system may be configured such that one or more automated actions are automatically triggered in accordance with the recommendation generated by the recommendation layer (in some embodiments following user input approving the recommendation).

Features and characteristics of some embodiments of systems for providing AI-augmented auditing platforms including a composable assurance integrity framework are provided below with reference to the figures and Appendices herein.

Improved systems and methods such as those disclosed herein may include performing data-driven and AI-augmented audits using full-population testing.

FIGS. 1A-1B shows a system architecture diagram for a system 100 for providing a composable integrity framework, in accordance with some embodiments. As shown in FIGS. 1A-1B, an orchestration engine 102 may be communicatively coupled with a process integrity engine 110, a data integrity engine 120, and a policy integrity engine 140. Each of the engines 102, 110, 120, and 140 may include one or more processors (including one or more of the same processors as one another) configured to perform any one or more of the techniques disclosed herein. In some embodiments, engines 110, 120, and/or 140 may be communicatively coupled with one another and/or with orchestration engine 102. In some embodiments, any one or more of the engines of system 100 may be configured to receive user inputs to control functionalities described herein. In some embodiments, orchestration engine 102 may be configured to coordinate cooperative functionalities across engines 110, 120, and/or 140, for example coordinating the exchange of data between said engines and/or controlling the manner in which an output generated by one of said engines may trigger and/or control a functionality of another.

Process integrity engine 110 may be configured to perform one or more AI-augmented reconciliation data processing operations in order to generate output data pertaining to ERP data validated against process. Data integrity engine 120 may be configured to perform one or more AI-augmented vouching and tracing data processing operations in order to validate ERP transaction data against source documents. Policy integrity 140 engine may be configured to perform one or more AI-augmented adjudication data processing operations in order to generate (based on one or more accounting standards) recalculated financial statement data and/or discrepancy and anomaly data.

In some embodiments, process integrity engine 110 may comprise ERP data source 112, reconciliation engine 114, and output data store 116. Process integrity engine may be configured to analyze ERP data in order to determine whether the data meets one or more criteria as defined by a process rule set and/or process model.

ERP data source 112 may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. ERP data source 112 may be communicatively coupled to one or more other components of system 100 and/or engine 110, and may be configured to provide ERP data to reconciliation engine 114, such that the ERP data can be processed by engine 114 to generate output data representing one or more process integrity determinations. In some embodiments, one or more components of system 100 and/or engine 110 may receive ERP data from ERP data source 112 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 112 may be provided in any suitable electronic data format.

In some embodiments, ERP data received from ERP data source 112 may include structured, unstructured, and/or partially-structured (e.g., semi-structured) data. In some embodiments, ERP data received from ERP data source 112 may include data representing one or more of general ledger information, invoice information, accounts receivable information, cash receipts information, and/or inventory information.

In some embodiments, reconciliation engine 114 may comprise any one or more processors configured to accept ERP data from ERP data source 112 as input data and to process said ERP data via one or more data processing operations in order to generate output data indicating whether the ERP data complies with one or more criteria. The one or more criteria may be defined by a user, defined by system settings, defined by third-party input, dynamically determined by the system, and/or defined by one or more predefined standards. In some embodiments, the one or more criteria may include criteria relating to timing (e.g., temporal requirements), order of events/steps, presence or absence of one or more events/steps, agreement of quantity, agreement of price, and/or agreement of amount. The one or more criteria may require that a plurality of representations throughout the available ERP data are consistent with one another (e.g., that the ERP data is internally consistent). The one or more criteria may require that events represented in the ERP data (e.g., events in a business process) occurred in a correct (e.g., predefined order) with respect to one another and/or that there are not any missing events in a predefined required sequence of events. The one or more criteria may be received by engine 110 can come from any suitable source, such as being input by a user, by a customer, and/or being determined using process mining logic.

In some embodiments, reconciliation engine 114 may assess one or more criteria by tracing ERP data backwards through a predefined sequence of events (e.g., moving backwards through a predefined business process starting from revenue and tracing backwards towards payment information).

In some embodiments, reconciliation engine 114 may not assess whether ERP data is substantiated (e.g., vouched) by underlying documentary evidence; instead, reconciliation engine 114 may make assessments for process integrity based entirely on representations made in ERP data itself. In some embodiments, vouching the assessed ERP data against one or more underlying documents may be performed by other components of system 100, such as data integrity engine 120.

Output data generated by reconciliation engine 114 may include electronic data in any suitable format indicating whether one or more assessed process criteria are or are not met by the ERP data that was provided by ERP data source 112. The output data may indicate whether criteria were met (e.g., a binary), an extent to which criteria were met (e.g., a score), a confidence level (e.g., confidence score) associated with one or more determinations, and/or metadata indicating the data and/or criteria and/or data source upon which one or more assessments was rendered.

Output data generated by reconciliation engine 114 may be stored in output data store 116 or in any other suitable computer storage component of system 100 and/or an associated system. Output data generated by reconciliation engine 114 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. In some embodiments, functionality by data integrity engine 120 and/or policy integrity engine 140 may be triggered by output data generated by reconciliation engine 114; this cooperative functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, if a process integrity criteria is not met, then system 100 may responsively determine (e.g., via data integrity engine 120) whether the ERP information that does not satisfy one or more process integrity criteria can be substantiated by underlying documents (or, e.g., whether the ERP data may in fact be inaccurate). In some embodiments, one or more anomalies indicated by the output data generated by reconciliation engine 114 may be transmitted to and/or displayed to a human user, for example as an alert soliciting manual review.

In some embodiments, analysis performed by process integrity engine 110 may be performed with respect to ERP data for a single transaction and/or with respect to ERP data for a plurality of transactions, for example a cluster of transactions.

In some embodiments, data integrity engine 120 may comprise ERP data source 122, document data source 124, exogenous data sources 126, document understanding engine 128, vouching and tracing engine 130, and output data store 132. Data integrity engine 120 may be configured to analyze ERP data, source document data, and/or exogenous data in order to perform one or more vouching/tracing operations to determine whether ERP data meets one or more vouching data integrity criteria.

ERP data source 122 may in some embodiments comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. ERP data source 122 may be communicatively coupled to one or more other components of system 100 and/or engine 120, and may be configured to provide ERP data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive ERP data from ERP data source 122 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 122 may be provided in any suitable electronic data format. In some embodiments, ERP data source 122 may share any one or more characteristics in common with ERP data source 112; in some embodiments, ERP data source 122 may include overlapping data sources with ERP data source 112; in some embodiments, system 100 may rely on a single ERP data source (or a single set of ERP data sources) in place of separate data sources 122 and 112. In some embodiments, ERP data received from ERP data source 122 may include structured, unstructured, and/or partially-structured (e.g., semi-structured) data. In some embodiments, ERP data received from ERP data source 122 may include data representing one or more of sales order information, invoice information, and/or accounts receivable information.

Document data source 124 may in some embodiments comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Document data source 124 may comprise a source of enterprise content management data. Document data source 124 may be communicatively coupled to one or more other components of system 100 and/or engine 120, and may be configured to provide document data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive documents data from documents data source 124 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Documents data received from documents data source 124 may be provided in any suitable electronic data format, including for example word processing document format, spreadsheet document format, CSV document format, PDF document format, and/or image document format. In some embodiments, documents received from documents data source 124 may include one or more of purchase order documents, bill of lading documents, and/or bank statement documents.

Exogenous/master data source 126 may in some embodiments comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Exogenous/master data source 126 may be communicatively coupled to one or more other components of system 100 and/or engine 120, and may be configured to provide exogenous data and/or master data thereto. In some embodiments, one or more components of system 100 and/or engine 120 may receive exogenous data and/or master data from exogenous/master data source 126 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Exogenous/master data received from exogenous data source 126 may be provided in any suitable electronic data format. In some embodiments, data received from exogenous data source 126 may include data representing customer information and/or product information.

In some embodiments, exogenous data from exogenous/master data source 126 may comprise data from a third-party data source and/or third-party organization. data that is external to a specific client. Exogenous data may include public SEC filing data (e.g., Edgar database), data from public internet resources, or the like. In some embodiments, master data from exogenous/master data source 126 may comprise endogenous data from a data source associated with a party relevant to the analysis being performed by system 100 (e.g., from a customer data source). Master data may include master customer data, master vendor data, and/or master product data.

In some embodiments, document understanding engine 128 may comprise any one or more processors configured to accept document data from documents data source 126 and/or exogenous data from exogenous data source 128 as input data and to process said received data via one or more data processing operations in order to extract output data. The one or more data processing operations may include one or more document preprocessing operations, character recognition operations, information extraction operations, and/or natural language understanding models. In some embodiments, the one or more data processing operations applied by document understanding engine 128 may be defined by a user, defined by system settings, defined by third-party input, and/or dynamically determined by the system. Document understanding engine 128 may generate output data representing information extracted from the input documents, and said output data may be transmitted to vouching and tracing engine 130 for further processing as described below. In some embodiments, output data generated by document understanding engine 128 may be in the form of a tuple (e.g., indicating entity name, location, entity value, and a confidence level associated with one or more of said values).

Vouching and tracing engine 130 may comprise any one or more processors configured to accept input data comprising ERP data and document data, and to process the input data to determine whether one or more vouching and tracing criteria are met. In some embodiments, assessing the one or more vouching or tracing criteria may comprise determining whether the ERP data is substantiated (e.g., vouched) by the document data. In some embodiments, vouching and tracing engine 130 may accept input data from ERP data source 122 and from document understanding engine 128. Vouching and tracing engine 130 may process said input data via one or more vouching and/or tracing data processing operations, thereby generating output data that indicates whether (or an extent to which) one or more vouching and/or tracing criteria are met. In some embodiments, the one or more data processing operations applied by vouching and tracing engine 130 may be defined by a user, defined by system settings, defined by third-party input, and/or dynamically determined by the system. Vouching and tracing engine 130 may generate output data comprising an indication of whether the assessed criteria are met (e.g., a binary indication), an extent to which the assessed criteria are met (e.g., a vouching score), associated confidence scores, and/or associated metadata indicating the underlying data on which the output data is based. In some embodiments, output data generated by document understanding engine 128 may be in the form of a tuple (e.g., indicating entity name, location, entity value, and a confidence level associated with one or more of said values).

In some embodiments, vouching and tracing engine 130 may assess existence criteria, completeness criteria, and/or accuracy criteria for any one or more assertion (and/or for any set (e.g., cluster) of assertions). Existence criteria may assess whether evidence for an assertion exists; completeness may criteria may assess whether all required evidence and all required components related to an assertion are present; and accuracy criteria may assess whether evidence indicates substantive informational content that is consistent with the assertion. In some embodiments, vouching and tracing engine 130 may apply one or more vouching and/or tracing operations as described in U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE,” filed Jun. 30, 2022, Atty. Docket No. 13574-20068.00, the entire contents of which is incorporated herein by reference.

Output data generated by vouching and tracing engine 130 may be stored in output data store 132 or in any other suitable computer storage component of system 100 and/or an associated system. Output data generated by vouching and tracing engine 130 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. In some embodiments, functionality by process integrity engine 110 and/or policy integrity engine 140 may be triggered by output data generated by vouching and tracing engine 130; this cooperative functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, one or more anomalies indicated by the output data generated by vouching and tracing engine 130 may be transmitted to and/or displayed to a human user, for example as an alert soliciting manual review.

In some embodiments, analysis performed by data integrity engine 120 may be performed with respect to data for a single transaction and/or with respect to data for a plurality of transactions, for example a cluster of transactions.

In some embodiments, policy integrity engine 140 may comprise adjudication engine 142, criteria data source 144, revised output data store 146, and output discrepancies and anomalies data store 148. Policy integrity engine 140 may be configured to analyze ERP data and/or source document data in order to perform one or more policy integrity data processing operations to determine whether the input data meets one or more policy integrity criteria.

Criteria data source 144 may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Criteria data source 144 may be communicatively coupled to one or more other components of system 100 and/or engine 140, and may be configured to provide criteria data thereto. In some embodiments, one or more components of system 100 and/or engine 140 may receive criteria data from criteria data source 144 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Criteria data received from criteria data source 144 may be provided in any suitable electronic data format, for example including one or more structured, unstructured, and/or partially structured documents. In some embodiments, engine 140 may generate rule sets for policy integrity criteria by extracting rules from documents received from criteria data source 144.

Adjudication engine 142 may comprise any one or more processors configured to accept input data comprising ERP data, document data, and/or data generated by process integrity engine 110 and/or data integrity engine 120, and to process said input data to determine whether one or more policy integrity criteria are met. In some embodiments, assessing the one or more policy integrity criteria may comprise determining whether the in data indicates that one or more processes represented by the input data complies with temporal criteria, order-of-operations criteria, disclosure criteria, related-parties criteria, collectability criteria, internal consistency criteria, transfer-of-title criteria, commercial substance criteria, and/or consideration/payment/collectability criteria. In some embodiments, assessing consideration may comprise assessing fixed consideration and/or variable consideration.

In some embodiments, adjudication engine 142 may accept, as input data, the output data that was generated by process integrity engine 110 and/or data integrity engine 120, and may process said received data in order to perform one or more data processing operations comprising a “tie out” operation and/or a “roll forward” operation in terms of tracing a transaction through a business process. Data indicating discrepancies and/or inconsistencies, as generated by process integrity engine 110 and/or data integrity engine 120, may become input data for adjudication engine 142.

In some embodiments, adjudication engine 142 may accept, as input data, standards data from criteria data source 144.

In some embodiments, adjudication engine 142 may accept, as input data, additional input data regarding related transactions, for example in the case of a transaction involving multiple shipments, returns/refunds, and/or a single payment for multiple transactions. In some embodiments, related transaction data may be required as input in accordance with an accounting principle and/or auditing principle being applied in accordance with criteria received from data source 144.

In some embodiments, adjudication engine 142 may be implemented by an inference engine where rules may be triggered by the inputs received (e.g., inputs indicating discrepancies and inconsistencies discovered by process integrity engine 110 and/or data integrity engine 120 and/or inputs indicating additional transactional data).

In some embodiments, adjudication engine 142 may consider implicit/explicit variable considerations, which may include various forms of discount (e.g., including discounts captured in the original purchase order and/or invoice, discount rules in pricing, discount rules for a customer, implicit discount not captured elsewhere, and/or discount that was applied to settle a transaction when there is discrepancy between the invoice and payment). The actual revenue that may be accrued may be the amount in the invoice adjusted by all forms of discount.

In some embodiments, adjudication engine 142 may consider non-cash consideration, which may include in-kind exchange.

In some embodiments, adjudication engine 142 may assess input data according to a multi-step process. In some embodiments, in step one, adjudication engine 142 may assess whether a contract exists, for example by assessing one or more transfer-of-title, commercial-substance, and/or consideration criteria. In some embodiments, in step two, adjudication engine 142 may identify a plurality of obligations for the contract, including for example a good (that is distinct), a service (that is distinct), a bundle of goods or services (that is distinct), and/or a series of distinct goods or services that are substantially the same and that have the same pattern of transfer to a customer. In some embodiments, in step three, adjudication engine 142 may identify a transaction price for the contract. In some embodiments, in step four, adjudication engine 142 may allocate the transaction price to obligations that have been fulfilled. In some embodiments, in step five, the corresponding transaction price is mapped onto the performance obligation, which may be the final step in recognizing revenue for each of the performance obligations that are satisfied.

In some embodiments, adjudication engine 142 may assess whether one or more contracts should be combined into a single contract.

In some embodiments, adjudication engine 142 may apply one or more adjudication operations as described in U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES. AND COLLECTABILITY,” filed. Jun. 30, 2022, Atty. Docket No. 13574-20069.00, the entire contents of which is incorporated herein by reference.

Adjudication engine 142 may process said input data (e.g., ERP data, document data, and/or output data generated by one or both of engines 110 and 120) via one or more policy integrity data processing operations, thereby generating output data that indicates whether (or an extent to which) one or more policy integrity criteria are met. In some embodiments, the one or more data processing operations applied by adjudication engine 142 may be defined by a user, defined by system settings, defined by third-party input, and/or dynamically determined by the system. In some embodiments, a user may select policy criteria that may include one or more accounting standards and/or one or more auditing standards. Adjudication engine 142 may generate output data comprising an indication of whether the assessed criteria are met (e.g., a binary indication), an extent to which the assessed criteria are met (e.g., a vouching score), associated confidence scores, and/or associated metadata indicating the underlying data on which the output data is based. In some embodiments, the output data generated by adjudication ending 142 may be in the form of a tuple (e.g., indicating entity name, location, entity value, and a confidence level associated with one or more of said values).

In some embodiments, output data generated by adjudication engine 142 may include a revised version of a document and/or ERP data that was inputted into adjudication engine 142, wherein the document and/or ERP data is revised to comply with one or more policy integrity standards. In some embodiments, output data generated by adjudication engine 142 may include a recalculated financial statement. Such output data may in some embodiments be transmitted to revised output data store 146.

In some embodiments, output data generated by adjudication engine 142 may include an indication of one or more discrepancies and/or anomalies, for example an indication as to one or more pieces of input data that did not comply with one or more policy integrity criteria. discrepancies and/or anomalies may be transmitted to discrepancies and anomalies data store 148 for storage, and/or may be transmitted to or displayed to a user (for example via an alert advising manual review).

Output data generated by adjudication engine 142 may be stored in output data store 146 and/or 148, and/or in any other suitable computer storage component of system 100 and/or an associated system. Output data generated by adjudication engine 142 may be transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions. In some embodiments, functionality by process integrity engine 110 and/or data integrity engine 120 may be triggered by output data generated by adjudication engine 142; this cooperative functionality may be controlled and coordinated by orchestration engine 102. In some embodiments, one or more anomalies indicated by the output data generated by adjudication engine 142 may be transmitted to and/or displayed to a human user, for example as an alert soliciting manual review.

In some embodiments, analysis performed by policy integrity engine 140 may be performed with respect to data for a single transaction and/or with respect to data for a plurality of transactions, for example a cluster of transactions.

Policy integrity criteria data source 144 may in some embodiments comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Policy integrity criteria data source 144 may be communicatively coupled to one or more other components of system 100 and/or engine 140, and may be configured to provide policy criteria data thereto. In some embodiments, one or more components of system 100 and/or engine 140 may receive criteria data from Policy integrity criteria data source 144 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Policy criteria data received from Policy integrity criteria data source 144 may be provided in any suitable electronic data format. In some embodiments, criteria data received from data source 144 may include structured, unstructured, and/or partially-structured (e.g., semi-structured) data.

In some embodiments, system 100 may provide one or more user-facing options such that a user of the system can configure the system to customize it for particular use-cases. For example, a user may select from available data sources, may select from available criteria, and may configure the manner in which one or more criteria are assessed. In some embodiments, a user may be able to choose whether (and/or an extent to which) one or more criteria needs to be satisfied. In some embodiments, a user may be able to select what data does and does not need to be tied out. A user may be able to configure system 100 in order to control what data is assessed in data integrity assessments, for example in controlling whether all data is assessed and whether one or more confidence levels below 100% is considered acceptable for data integrity assessments. A user may be able to configure system 100 in order to control what policies (e.g., what standards) are applied for the purposes of policy integrity assessments.

In some embodiments, system 100 may allow users to selectively perform one or more of: process integrity, data integrity, and policy integrity. In some embodiments, one portion of system 100 may be applied without applying other portions. For example, in a case in which ERP data is available but underlying documents data is not available, system 100 may apply process integrity assessments and/or policy integrity assessments without applying any data integrity assessments.

In some embodiments, output data generated by engine 110, engine 120, and/or engine 140 may be used to generate an overall risk assessment score. In some embodiments, output data generated by one or two of the engines 110, 120, or 140 may be sufficient to indicate a high enough level of risk such that assessment by the remaining engine(s) is not applied. In some embodiments, output data generated by one or two of the engines 110, 120, or 140 may be sufficient to indicate a low enough level of risk such that assessment by the remaining engine(s) is not applied.

FIGS. 2A-2B depict a conceptual architecture for a system 200 for providing a composable integrity framework, in accordance with some embodiments. As shown in FIGS. 2A-2B, system 200 may include data lake layer 202; knowledge substrate layer 208; integrity microservices layer 210; normalization, contextualization, and integrity verification layer 212; insight microservices layer 220; and recommendation layer 222. FIG. 2A shows layers at the bottom of the architecture, while FIG. 2B shows layers at the top of the architecture.

Data lake layer 202 may in some embodiments comprise endogenous data sources 204 and exogenous data sources 206, each of which may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Data sources 204 and/or 206 may be communicatively coupled to one or more other components of system 200, and may be configured to provide data thereto. In some embodiments, one or more components of system 200 may receive data from data sources 204 and/or 206 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Data received from data sources 204 and/or 206 may be provided in any suitable electronic data format. In some embodiments, data received from data sources 204 and/or 206 may include structured, unstructured, and/or partially-structured (e.g., semi-structured) data. In some embodiments, endogenous data source 204 may provide data including internal data sourced directly from the party to whom it pertains, for example ERP representations from a party. In some embodiments, endogenous data source 206 may provide data including external data sourced from third-party sources other than the party to whom the data pertains.

Knowledge substrate layer 208 may comprise one or more processors and one or more data stores. Knowledge substrate layer 208 may comprise one or more processors configured to receive data from data sources 204 and/or 206 and to process said data to generate processed endogenous/exogenous knowledge data, including for example master data, ontology/dictionary data, case library data, curated document data, process knowledge data, and/or accounting/auditing standard data.

Integrity microservices layer 210 may comprise one or more processors and one or more data stores. Integrity microservices layer 210 may comprise one or more processors configured to receive data from data source 204, data source 206, and/or knowledge substrate layer 208. The one or more processors of microservices layer 210 may apply one or more data processing operations to the received data to generate output data. In some embodiments, microservices layer 210 may apply one or more microservices including, for example: open source microservices (e.g., openCV, Tesseract, NLTK); vendor tools (e.g. Abbyy, Tableu); and/or custom tools (e.g., InfoExtract).

Normalization, contextualization, and integrity verification layer 212 may comprise one or more processors and one or more data stores. Normalization, contextualization, and integrity verification layer 212 may comprise one or more processors configured to receive input data (e.g., from one or more of the underlying layers 202, 208, and/or 210 in system 200 and/or from one or more external data sources) and to apply one or more integrity assessment data processing models configured to generate output data generating an indication of whether (and/or an extent to which) one or more integrity criteria are satisfied by the input data. In some embodiments, layer 212 may generate an overall risk score indicating a risk associated with a transaction (or with a set of transactions).

In some embodiments, layer 212 may share any one or more characteristics in common with system 100 described above with respect to FIGS. 1A-1B. In some embodiments, layer 212 may comprise process integrity engine 214 (which may share any one or more characteristics in common with process integrity engine 110 described above with respect to FIGS. 1A-1B), data integrity engine 216 (which may share any one or more characteristics in common with data integrity engine 120 described above with respect to FIGS. 1A-1B), and policy integrity engine 218 (which may share any one or more characteristics in common with policy integrity engine 140 described above with respect to FIGS. 1A-1B).

Insight microservices layer 220 may comprise one or more processors and one or more data stores. Insight microservices layer 220 may comprise one or more processors configured to receive input data (e.g., from one or more of the underlying layers 202, 208, 210, and/or 212 in system 200 and/or from one or more external data sources) and to apply one or more data processing models to generate insight data. In some embodiments, insight microservices layer 220 may apply one or more clustering operations configured to cluster transactions based on customer, product, time, location, or other suitable clustering criteria. In some embodiments, insight microservices layer 220 may extract behavior for a population and/or subpopulation of transactions from layer 212. Transactions may be clustered based on time, location, amount, product, client, vendor, and/or any other attribute or combination of attributes.

Recommendation layer 222 may comprise one or more processors and one or more data stores. Recommendation layer 222 may comprise one or more processors configured to receive input data (e.g., from one or more of the underlying layers 202, 208, 210, 212, and/or 220 in system 200 and/or from one or more external data sources) and to apply one or more data processing models to generate recommendation data. The output generated by recommendation layer may include one or more remediation actions based on the output from the underlying layers (e.g., 220 and 212). In some embodiments, recommendation data may comprise data included in an alert transmitted to and/or displayed to a human user or analyst in order to prompt further review.

FIG. 3 depicts a diagram showing the probability of an overall assertion being true using a Bayesian belief network to trace uncertainty in reasoning, in accordance with some embodiments. Data analysis in accordance with this network may be applied by or more data processing engines of the system. FIG. 3 depicts how an overall probability may be determined based on a plurality of underlying probabilities, for example including a probability that an existence (valuation) assertion is true, a probability that a cutoff assertion is true, and/or a probability that an accuracy assertion is true.

FIG. 4 depicts evidence reasoning for revenue and receivables using a Bayesian belief network, in accordance with some embodiments. Data analysis in accordance with this network may be applied by or more data processing engines of the system.

Financial statements may include the following components:

- A balance sheet or statement of financial position, reports on a company's assets, liabilities, and owners equity at a given point in time.
- An income statement—or profit and loss report (P&L report), or statement of comprehensive income, or statement of revenue & expense—reports on a company's income, expenses, and profits over a stated period. A profit and loss statement provides information on the operation of the enterprise. These include sales and the various expenses incurred during the stated period.
- A statement of changes in equity or statement of equity, or statement of retained earnings, reports on the changes in equity of the company over a stated period.
- A cash flow statement reports on a company's cash flow activities, particularly its operating, investing and financing activities over a stated period.
- A comprehensive income statement involves those other comprehensive income items which are not included while determining net income.

The heading of these financial statements is the “line item”, which may include the following:

1. Revenue

2. Cost of Sales

3. Gross Profit

4. Admin Expenses

5. Selling Expenses

6. Operating Profit

7. Finance Cost

8. Profit before tax

These line items can be mapped to different part of the financial statements. As an example, cash asset line item is mapped to Balance Sheet and Statement of Cash Flow. Financial statement line items can be mapped to various part of chart of accounts. Within chart of accounts, there are balance sheet accounts, which may be needed to create a balance sheet:

- 1. Asset accounts record any resources your company owns that provide value to the company. They can be physical assets like land, equipment and cash, or intangible things like patents, trademarks and software.
- 2. Liability accounts are a record of all the debts the company owes. Liability accounts usually have the word “payable” in their name—accounts payable, wages payable, invoices payable. “Unearned revenues” are another kind of liability account—usually cash payments that your company has received before services are delivered.
- 3. Equity accounts are a little more abstract. They represent what's left of the business after subtracting all company's liabilities from its assets. They basically measure how valuable the company is to its owner or shareholders.

Separately, the income statement accounts include the following:

- Revenue accounts keep track of any income the business brings in from the sale of goods, services or rent.
- Expense accounts are all the money and resources the business spend in the process of generating revenues, i.e. utilities, wages and rent.

Case Study: Revenue & Receivable Audit

Auditing of financial statement line items such as revenue may need to establish the following assertions:

- Occurrence: Have the transactions occurred and pertain to the entity
- Completeness: Have all transactions been recorded
- Accuracy: Have transactions been accurately recorded
- Cutoff: Have transactions been recorded in the correct accounting period
- Classification: Have transactions been recorded in the proper accounts

In order to conduct audit of financial statement line item, such as revenue and receivables, substantive testing on the receivables and revenue are conducted to establish the assertion above:

Substantive tests of revenue to establish occurrence, accuracy, valuation:

- Vouch recorded sales transaction back to customer order and shipping document
- Compare quantities billed and shipped with customer order
- Special care should be given to sales recorded at the end of the year for cutoff
- Scan sales journal for duplicate entries

Substantive tests of revenue cutoff tests:

- Can be performed for sales, sales returns, cash receipts
- Provides evidence whether transactions are recorded in the proper period
- Cutoff period is usually several days before and after balance sheet date
- Extent of cutoff tests depends on effectiveness of client controls
- Sales cutoff
  - Auditor selects sample of sales recorded during cutoff period and vouches back to sales invoice and shipping documents to determine whether sales are recorded in proper period
  - Cutoff tests assertions of existence and completeness
  - Auditor may also examine terms of sales contracts
- Sales return cutoff
  - Client should document return of goods using receiving reports
  - Reports should date, description, condition, quantity of goods
  - Auditor selects sample of receiving reports issued during cutoff period and determines whether credit was recorded in the correct period

Substantive Tests of Revenue for Completeness:

- Use of pre-numbered documents is important
- Analytical procedures
- Cutoff tests
- Auditor selects sample of shipping documents and traces them into the sales journal to test completeness of recording of sales

Substantive Tests of Accounts Receivable Existence & Occurrence:

- Valuation
- Are sales and receivables initially recorded at their correct amount?
- Will client collect full amount of recorded receivables (i.e. collectability)?
- Rights and Obligations
- Contingent liabilities associated with factor or sales arrangements
- Discounted receivables
- Presentation and Disclosure
- Pledged, discounted, assigned, or related party receivables

Substantive Tests of Accounts Receivable:

- Obtain and evaluate aging of accounts receivable
- Confirm receivables with customers
- Perform cutoff tests
- Review subsequent collections of receivables

Regarding aging accounts receivable, because receivables are reported at net realizable value, auditors must evaluate management estimates of uncollectible accounts:

- Auditor will obtain or prepare schedule of aged accounts receivable
  - If schedule is prepared by client, it is tested for mathematical and aging accuracy
- Aging schedule can be used to
  - Agree detail to control account balance
  - Select customer balances for confirmation
  - Identify amounts due from related parties for disclosure
  - Identify past-due balances
- Auditor evaluates percentages of uncollectibility
- Auditor then recalculates balance in the Allowance account

Regarding aging accounts receivable, additional substantive tests may involve confirming receivables with customers:

- Confirmations provide reliable external evidence about the
- Existence of recorded accounts receivable and
- Completeness of cash collections, sales discounts, and sales returns and allowances
- Confirmations are required by GAAS unless one of the following is present:
- Receivables are not material
- Use of confirmations would be ineffective
- Environment risk is assessed as low and sufficient evidence is available from using other substantive tests

Types of confirmation may include positive confirmations:

- Customers are asked to agree the amount on the confirmation with their accounting records and to respond directly to the auditor whether they agree with the amount or not
- Positive confirmation requires a response
- If customer does not respond, auditor must use alternative procedures

Types of confirmation may include negative confirmations:

- Customers are asked to respond only if they disagree with the balance (non-response is assumed to mean agreement)
- Less expensive since there are no additional procedures if customer does not respond
- May be used when all of the following are present
  - Confirming a large number of small customer balances
  - Environment risk for receivables is assessed as low
  - Auditor believes customers will give proper attention to confirmations

Types of confirmation may include follow-up procedures for non-responses:

- If customer does not respond to positive confirmation, auditor may send a second, or even third, request
- If customer still does not respond, auditor will use alternative procedures
- Examine the cash receipts journal for cash collected after year-end
- Care is taken to ensure receipt is year-end receivable, not subsequent sale
- Examine documents supporting receivable (purchase order, sales invoice, shipping documents) to determine if sale occurred prior to year-end
- Evidence gathered from internal documents is not considered as reliable

Sampling For Substantive Testing

In PCAOB AS 2315, the audit sampling approach is discussed when “application of an audit procedure to less than 100 percent of the items within an account balance or class of transactions for the purpose of evaluating some characteristic of the balance or class.” Sampling is also one of the reasons that the audit results can only achieve reasonable assurance as opposed to absolute assurance.

Reasonable assurance is a high level of assurance regarding material misstatements, but not an absolute one. Reasonable assurance includes the understanding that there is a remote likelihood that material misstatements will not be prevented or detected on a timely basis. To achieve reasonable assurance, the auditor needs to obtain sufficient appropriate audit evidence to reduce audit risk to an acceptably low level. This means that there is some uncertainty arising from the use of sampling, since it is possible that a material misstatement will be missed. On the other hand, absolute assurance provides a guarantee that the financial statements are free from material misstatements.

Absolutes are not attainable due to factors such as the need for professional judgment, the use of testing, the inherent limitations of internal control, the reliance in accounting on estimates, and the fact that audit evidence is generally persuasive rather than conclusive.

Some insight into what reasonable assurance means to the auditor may be gained by recognizing that it is the complement of audit risk: Audit Risk+Assurance Level=100%.

Audit risk is defined in AU sec. 312, Audit Risk and Materiality in Conducting an Audit, as “the risk that the auditor may unknowingly fail to appropriately modify his or her opinion on financial statements that are materially misstated.” Because the auditor must limit overall audit risk to a low level, reasonable assurance must be at a high level. Stated in mathematical terms, if audit risk is 5 percent, then the level of assurance is 95 percent.

In general, audit risk is the product of inherent risk, control risk, and detection risk:

(Audit risk)=(Risk of Material Misstatement)*(Detection risk)

where:

(Risk of Material Misstatement)=(Inherent risk of material misstatement)*(Control Risk)

Risk of material misstatement, or RMM, is composed of the inherent risk of material misstatement while the control risk is the controls of the audit client that will not prevent or detect the material misstatement.

Example Embodiment

The example embodiment discussed below is demonstrated using audit of revenue and receivable. The revenue and receivable accounts capture the revenue generated through the order to cash process. The order to cash process includes the creation of sales order, preparation of shipping (if the order involves a shipment), invoicing of the customer, and receipt of the payment when the customer pays. This process is repeated for all the transactions that are recorded in the revenue account within the general ledger.

During the order to cash process, various information systems may need to participate in the business process. The sales orders are captured in the order management system (which can be part of the ERP system), which will trigger the warehouse management to prepare the shipment according to the delivery date. When the product is shipped, the inventory management will record the reduction of the inventory. And the order management will invoice the customer (based on the delivery term). While invoicing the customer, this transaction will be posted in the revenue account (credit) and account receivable account (debit). And when the payment is received, it will be recorded in account receivable (credit) and cash account (debit) within general ledger.

The audit of a revenue account may need to validate the value in the account by tracing the transaction through the system in conjunction with the corroborating evidence to ensure that each transaction has been posted correctly according to the accounting policy ASC 606 (IFRS 15). The sales order is vouched against the purchase order, the shipment is vouched against the Bill of Lading, and the payment is vouched against a variety of payment details such as bank statement, credit card processor settlement report, ACH daily report, etc.

Data Integrity

Data integrity intends to accomplish existence (or occurrence), completeness, and accuracy for the audit process. Data integrity includes vouching and tracing:

Vouching refers to the inspection of documentary evidence supporting and substantiating a transaction. It is the practice of establishing the authenticity of the transactions recorded in the primary books of account. It includes verifying a transaction recorded in the books of account with the relevant documentary evidence and the authority on the basis of which the entry has been made; also confirming that the amount mentioned in the voucher has been posted to an appropriate account which would disclose the nature of the transaction on its inclusion in the final statements of account. In some embodiments, vouching does not include valuation.

Tracing is the process of following a transaction in the accounting records back to the source document. This may involve locating an item in the general ledger, tracing it back to a subsidiary ledger (if necessary) to look for the unique identifying document number, and then going to the accounting files to locate the source document. Tracing is used to track down transactional errors, and also to verify that transactions were recorded properly.

Tracing provides evidence for completeness. Vouching provides evidence for occurrence. Tracing from a document to the financial statement may indicate completeness but not occurrence, because there are pieces of that overall financial statement number that haven't been looked at. Vouching may indicate occurrence, but not completeness, as an original document may be missing (e.g., if it was not included in a financial statement to begin with).

Modality of the documentary evidence, in some embodiments, is in the form of documents, whether it is a pdf file, a word file, an excel spreadsheet, or an email. Evidence provided by third parties, such as a bank, a shipping company, or customer of an entity may serve as better evidence than evidence produced directly by the audited entity. Evidence provided by a third party in a digital form such as data that can be directly acquired through an API or web portal may, in some embodiments, provide the strongest evidence. Evidence available in structured or semi-structured form without requiring further interpretation such as EDI may also provide accurate corroboration, when it is available. Documents in the form of excel, word, or email may require the use of natural language processing to comprehend, while scanned document may require additional OCR to extract the characters, words, entities, paragraphs and tables from the documents.

In some embodiments, data integrity validation may be performed as follows for each of the following kinds of FSLI:

- Revenue and Receivables: evidence may include one or more of: purchase order, various forms of shipping confirmation (e.g., bill of lading, proof of delivery, packing slip, packing list, shipping confirmation from third-party such as Shippo.com), various forms of payment details (e.g., cash receipts, bank statements, eChecks, remittance advice, ACH report, information from third party such as plaid.com), transaction and settlement report for credit card, and/or various forms of contracts. Note that some of these documentary evidence could be in the form of EDI messages.
- Expense and Payables: evidence may include one or more of: invoices, proof of delivery or goods received, payment details, and/or various forms of contracts. Note that some of these documentary evidence could be in the form of EDI messages.
- JEs: evidence may include one or more of: various supporting documents for JE entries such as invoice, cash receipts, excel, word, pdf, emails, and/or various electronic evidence. The cash and bank reconciliation may involve bank statements as well, to confirm assertions within the cash accounts within chart of accounts of G/L
- Cash and Cash Equivalents: evidence may include one or more of: bank statements and/or lockbox cash management daily reports.
- Property, Plant and Equipment (including lease accounting): evidence related to capital assets include lease agreements, evidence supporting physical custody of asset (including images and video), repair receipts, various documents for supporting depreciation calculation.
- Inventory: evidence may include demonstration of physical custody—such as image and/or video, and shipping detail to demonstrate the movement of inventory.

It should be noted that, in some embodiments, the same set of documents may be used for data integrity validation for various FSLIs. As an example, information from shipping documents may be used for both revenue & receivables as well as for inventory FSLIs.

Process Integrity

Process integrity may evaluate the consistency of a process for each step of the process, both on the business process side and the accounting process. In some embodiments, process integrity validation may be performed as follows for each of the following kinds of FSLI:

- Revenue and Receivables: includes validation from sales order to invoice, invoice to inventory relief, invoice to revenue G/L, invoice to account receivable, invoice to customer transactions, payment journal to account receivable, credit memo to inventory return, credit memo to account receivable, and/or credit memo to revenue.
- Expense and Payables: includes validation from purchase requisition account payable, purchase requisition to expense, payment journal to account payable, treasury to cash, purchase requisition to inventory addition and/or various return processing.
- JEs: includes validating business processes involving creating and adjusting journal entries, including those flowing from the revenue (invoice to account receivable, invoice to cash accounts within G/L), expense, equities, and/or liabilities.
- Cash and Cash Equivalents: includes validating business processes involving the cash and cash equivalent within chart of accounts—including payment journal to cash, treasury to cash.
- Property Plant and Equipment (including lease accounting): includes business processes involving the setup, operation & maintenance, and/or disposal of PPE.
- Inventory: Related business processes that touch inventory ledger include inventory relief, and/or inventory return.

It should be noted that, in some embodiments, many business processes touch upon one or more FSLI audit. As an example, the payment journal to cash process exist in revenue and receivable, JE, and Cash and Cash Equivalents.

Policy Integrity

In some embodiments, policy integrity validation may be performed as follows for each of the following kinds of FSLI:

- Revenue and Receivables: Pertinent accounting standards include ASC 606 (IFRS 15) for “Revenue Recognition from Contracts with Customers”.
- Expense and Payables: Pertinent accounting standards include ASC 705 cost of sales and services. Separate accounting standards exist for compensation (ASC 710, ASC 712, ASC 715, and ASC 718), R&D (ASC 730), and income tax (ASC 740).
- JE: Pertinent accounting standards include ASC 210—Balance Sheet, ASC 220—Income Statement, ASC 225—Income Statement, and ASC 230 Statement of Cash Flow.
- Cash and Cash Equivalents: Pertinent accounting standards include ASCI 210—Balance Sheet (was ASC 305).
- Property Plant and Equipment (including lease accounting): Pertinent accounting standards include ASC 842 (IFRS 16), which replaced ASC 840 at the beginning of 2019.
- Inventory: Pertinent accounting standards include ASC 330.

Orchestration

As shown in FIG. 3, an orchestration engine may be used to orchestrate underlying modules within the data integrity system (for example, vouching and tracing of purchase order, bank statements), the process integrity system (for example, validate the order to cash process), and the policy integrity system (for example, modules related to adjudicating revenue recognition based on ASC 606). The orchestration engine may be configured to consider the dependency among these integrity validation systems—as policy integrity may be dependent on results from data integrity and process integrity. In some embodiments, data and process integrity could largely run concurrently, as they may, in some embodiments, not have dependency with respect to each other. Within each integrity module, the orchestration engine may leverage the maximal concurrency among modules.

Descriptions of characteristics and features of various modules is included below.

Data Integrity Modules

Invoice Vouching.

This module performs symmetric vouching and tracing between invoice data in the ERP system and invoice data extracted (e.g., using ABBYY Flexicapture) from the physical invoice after post-processing has been performed. Post processing may involve the use of master data to normalize the customer name, customer address, line item number, line item descriptions and customer item number. Identification of the ERP data entry with the extracted document entry may be determined by the Invoice Number, and fuzzy comparison may be performed on the given configurable input list of fields to compare.

Purchase Order Vouching:

This module performs symmetric vouching and tracing between sales order in the ERP system and the purchase order data extracted (e.g., using template based approach such as Abbyy Flexicapture or templateless approach) from the physical PO after post-processing has been performed. Post processing may involve the use of master data to normalize the customer name, customer address, line item number, line item descriptions and customer item number. Identification of the ERP data entry with the extracted document entry may be determined by the PO Number, and fuzzy comparison may be performed on the given configurable input list of fields to compare.

Bill of Lading Vouching:

This module performs symmetric vouching and tracing between invoice data in the customer ERP system and bill of lading forms data extracted (for example using ABBYY Flexicapture) from the physical Bill of Lading—including packing slip, packing list, and/or BoL form after post-processing has been performed. Identification of the ERP data entry with the extracted document entry is determined by the Sales order or Invoice number, and fuzzy comparison is performed on the given configurable input list of fields to compare.

Third-Party Shipping Record Vouching:

Using the multi-carrier shipment tracking API (e.g. Shippo), this module may validate the accepted date, delivered date, ship from address, and/or ship to address of a given shipment.

Payment Vouching:

The Cash Receipts Vouching Module compares the ERP journal payment entry with evidence of payment from various supporting documents such as bank statement, eChecks, Remittance Advice, daily ACH report, and/or credit card settlement report. One or more of two different algorithms may be used to attempt to match journal voucher data with bank statement data. The first algorithm is a “Fuzzy Date+Amount” algorithm. Under this first algorithm, journal vouchers and bank statements are matched by considering their date and amount; in the case of date, a certain window of days for matching (+/−delta_days_windows) may be allowed, as there may be small discrepancies between date recorded on a bank statement versus on a journal entry. The second algorithm is a “Knapsack Amount Matching” algorithm. Under this second algorithm, in cases such as counter deposit or a lump-sum deposit, a single bank transaction can map to a number of journal vouchers. Knapsack matching allows consideration of groups of journal vouchers matching to a single bank statement transaction, and may return several possible groups of journal vouchers that sum to the bank statement transaction's amount. To pick the optimal from several possible groups, the system may select the group of journal vouchers that has the highest match score, wherein the match score may be based on fuzzy comparing customer name and mentioned invoices' amounts.

Process Integrity Modules

Sales Order to Invoice:

This module checks correspondence between sales orders and invoices to ensure that the customer information (e.g., name, billing & shipping addresses, line items in terms of item number, descriptions, and/or unit price) is consistent. This module also helps to validate partially invoiced sales orders.

Invoice to Customer Transaction:

This module checks the correspondence between sales in the customer transaction table (filtered for sales) and the Sales Invoice Headers table. The system may check that each Invoice Number (e.g., primary key) from Customer Transactions are present in Sales Invoice Headers, and vice-versa.

Payment Journal to Customer Transaction:

This module checks transactions from the PaymentJournal against the payments in Transactions. The system may perform checks at the TransactionID level. The system may fuzzy check on the amount, customer account number, date, and/or and currency. Moreover, the system may assign reason codes to any missing transaction in one of the tables, discrepancies between the columns, loss of information during the aggregation from InvoiceNumber to TransactionID level (for the Payment Journal), and/or the relationship between PaymentJournal. InvoiceNumber and PaymentJournal.TransactionID (one-to-one/many-to-one).

Account Receivable Roll Forward:

This module is designed to present a beginning balance and reconciles by reviewing the current period account receivable activities for accuracy to arrive at the ending balance. The module starts with the LedgerTransactionList table and performs COA number and financial period filtering to identify journal entries of interest. Afterwards, VoucherNumbers from the identified entries were used to perform left join on GeneralLedgerARPosted and GeneralLedgerARRemoved to fetch voucher header information and invoice level information. Note that the AR is posted when invoicing the customer and removed when the payment is received. For each entry, the type of account receivable activity, original invoice amount, recalculated invoice amount, and/or match metrics are identified and calculated.

AR Removed Extended:

This module may validate that AR-removed entries correspond to payment received voucher and can be linked to a corresponding invoice.

Return of Inventory:

This module ensures that items related to a credit memo were properly added back to inventory on the financial statements if they were supposed to be (e.g., by validating that they were not scrapped or sent back to the customer). The accounting event that occurs at the time of return includes a debit to inventory and a credit to COGS; whereas, a second entry includes a debit to Revenue and a credit to AR. As such, in some embodiments, for every event that an item was credited and not scrapped, there should be an event that adds the item back in the inventory subledger. The module attempts to determine if there is agreement between the item number, unit of measure, and/or quantity of each credit memo to the inventory ledger, while also ensuring that the date of both events occurs during the fiscal year. Each credit memo is assigned a unique identifier that is also present in the inventory ledger (e.g., voucher number), and this is used to identify the existence of the record in both tables. Each credit memo found in the inventory ledger is assigned a binary score of 1 or 0 based on whether or not the voucher assigned to the credit memo was found in the inventory ledger. The module then compares the aforementioned metrics based on fuzzy logic or exact match (quantity only). This step allows the system to determine if inventory was properly added back.

Credit Memo to Customer Transactions:

This module is designed to ensure that credit memos are also included in the customer transaction table. Each credit memo is identified by the invoice number that ends in ‘CCN’ that is also present in the customer transaction filtered by type of transaction (Sales), and this is used to identify the existence of the record in both tables. Each record found in the invoice table (filtered for credit memos) and transaction tables are assigned a binary score of 1 or 0 based on whether or not they are found in both data sets. Similarly, checks are performed on the amount, customer, date, and/or currency to check the accuracy and validity of the data.

Payment Journal to Customer Transactions:

This module is configured to check transactions from the PaymentJournal against the payments in Transactions. The system performs checks at the TransactionID level. The system performs fuzzy checks on the amount, customer account number, date, and/or currency. Moreover, the system assign reason codes to any missing transaction in one of the tables, discrepancies between the columns, loss of information during the aggregation from InvoiceNumber to TransactionID level (for the Payment Journa), and/or the relationship between PaymentJournal. InvoiceNumber and PaymentJournal.TransactionID (one-to-one/many-to-one).

Inventory Relief:

This module is designed to verify that items being invoiced were properly relieved from inventory. The module attempts to match the item number, unit of measure, quantity, and/or date from invoice lines to the inventory ledger. Additionally, it flags items shipped but not invoiced, items invoiced in advance of relief, and/or invoice/shipping dates that cross fiscal periods. It should be noted that invoiced lines not considered to be an item, such as a service, may not be expected to be relieved because they may not, in some embodiments, be relevant to inventory. Items invoiced with a zero quantity may also not, in some embodiments, be expected to be relieved. Each invoiced line is assigned a common identifier that is also present in the inventory, and this may be used to identify the existence of the record in both tables. Each invoice found in the inventory ledger is assigned a binary score of 1 or 0 based on whether or not the voucher assigned to the invoice was found in the inventory ledger. The module then compares the aforementioned metrics based on fuzzy logic, while expecting an exact match. This step allows the system to determine if inventory was properly relieved. These checks allow the system to ensure that items being recognized as revenue were removed from inventory on the financial statements. The accounting event that occurs at the time of shipment includes a debit to COGS and a credit to Inventory; whereas, the accounting event that occurs at the time of revenue recognition includes a debit to AR and a credit to Revenue. As such, for every event that an item was invoiced, there should be an event that removes the item from the inventory subledger.

Policy Integrity Modules

Transfer of Control:

This module uses the shipping term to determine whether the transfer control occurs at the shipping point, delivery point, or somewhere in between. This enables the testing whether the obligation is completed before or after the boundary of accounting periods.

Contract Approved and Committed:

This module/case-study is designed to identify the contract, identify performance obligations, and identify commercial substance. The likely sources of potential misstatement covered in this section are unauthorized changes, erroneous sales orders/contracts, orders not entered correctly, inappropriate allocation of transaction price, separate performance obligations that are not in accordance, and/or separate performance obligations that are not appropriately account for.

Fixed Consideration:

This module/case study is designed to recognize unit price per PO and any difference noted from the unit price reported in the ERP. The likely sources of potential misstatement covered in this section are invoice pricing is not approved or is not entered in system appropriately, total contract consideration including cash, non-cash, fixed and variable consideration is not accurately or completely identified, and/or transaction price is not appropriately determined in accordance with IFRS 15/ASC 606.

Calculated Expected Revenue:

This module recalculates the expected revenue after taking into account of the existence of agreement (e.g., contracts), identifying the obligation, determining the transaction price, allocating the transaction price to the obligation, and determining the final revenue that can be recognized.

ASC 606 (IFRS 15) may be mapped to data, process, and policy integrity.

In some embodiments, any one or more of the data processing operations, cross-validation procedures, vouching procedures, and/or other methods/techniques depicted herein may be performed, in whole or in part, by one or more of the systems (and/or components/modules thereof) disclosed herein.

Context Aware Data Integrity

Information integrity (also referred to as data integrity) may be defined as the representational faithfulness of the information to the underlying subject of that information and the fitness of the information for its intended use. Information integrity including vouching and tracing are essential for FSLI audit in terms of satisfying two foundational assertions—completeness and existence. Vouching is to validate the values of the entries in the ledger against their supporting documents (or underlying representation of the real world), while Tracing is to validate each of the documents (or the representation of the real world) and trace it to the entries in the ledger. Vouching is used to establish the “existence” assertion, while Tracing is used to establish the “completeness” assertion.

According to known techniques, vouching and tracing are done independently as two separate processes when the audit is based on sampling. For example, sampling rate could be 1-5% of all available transactions during a typical audit. However, known systems and methods for information integrity do not handle fuzzy comparison, do not leverage of context of the evidence (e.g., master data, industry ontology, industry and client knowledge), do not leverage multiple evidence to establish data integrity, do not address the challenge that evidence might have been amended or updated, and do not address one-to-many/many-to-one/many-to-many relationships. Accordingly, there is a need for improved systems and methods that address one or more of the above-identified shortcomings.

Disclosed herein are methods and systems for performing automated (or semi-automated) data processing operations for auditing processes, wherein vouching and tracing (e.g., for FSLI audit for multiple documents and ERP records) are conducted semi-automatically or fully automatically at the same time, wherein the specification and the actual matching of the corresponding fields in the ledger and the supporting source documents are performed automatically.

The systems and methods disclosed herein may provide improvements over known approaches in a variety of ways. For example, the systems and methods disclosed herein may perform vouching and tracing simultaneously, as opposed to performing them as two separate processes and/or at two separate times. The systems and methods may classify a collection of documents and identify available evidence for going through the representation faithfulness testing. The systems and methods may simultaneously leverage multiple pieces of evidence (e.g., process more than one piece of evidence in accordance with a single application of a data processing operation) that weigh on a single assertion (perhaps contradictorily).

Furthermore, the systems and methods disclosed herein may leverage a progressive framework to organize the available evidence to ensure fast direct matching while allowing maximal opportunity for matching evidence with higher ambiguity. The systems and methods may progressively organize ERP/ledger data and the collections of unstructured documents based on primary identifiers. Given the potential ambiguity in terms of extracting the identifiers from documents, these documents could be potentially in multiple groups.

Furthermore, the systems and methods disclosed herein may leverage a fuzzy comparison framework to allow potential minor deviations. The systems and methods may simultaneously compare and match the entries from the ledger and unstructured documents. The systems and methods may use fuzzy comparison for numbers and strings from the ledger and unstructured documents.

Furthermore, the systems and methods disclosed herein may leverage contextual information—both endogenous and exogenous information and knowledge including master data—to ensure that the data is fully comprehended in context. The systems and methods may automatically match supporting field(s) within a document through machine learning (including deep learning, reinforcement learning, and/or continuous learning).

Furthermore, the systems disclosed herein may have the ability to continuously/iteratively improve its performance, e.g., based on machine learning and processing of feedback, over time. The systems and methods may automatically augment support documents with additional contextual knowledge.

Described below are additional features, characteristics, and embodiments for systems and methods for semi-automated or fully-automated simultaneous vouching and tracing for data integrity. In some embodiments, any one or more of the data integrity techniques discussed herein may be used as part of a composable assurance integrity system such as those described herein. In some embodiments, any one or more of the data integrity techniques discussed herein may share any one or more characteristics/features with a data integrity technique discussed above with respect to an composable assurance integrity framework.

In some embodiments, a system may be configured to perform one or more data processing operations for establishing the representation faithfulness for financial data that are usable to determine whether there are any material misstatements, e.g., in FSLIs.

The system may establish a subset of data within a financial systems (such as an ERP system) within the specified period (e.g., accounting period) for which validation of representation faithfulness through vouching & tracing between data in financial systems and various evidence is to be performed. Note that some of the data such as inventory, shipping, and/or payment may be applicable to multiple FSLIs. The subset selection may be based on a combination of best practice, prior knowledge of the specific industry, and/or and specific client considerations. The window for conducting representation faithfulness validation may start earlier and may end later than an accounting period based on the cutoff criteria, best practice, and/or industry- and client-specific knowledge. Information regarding subset selection may be indicated by user input and/or automatically determined by the system.

The system may establish a set of (potentially multi-modal) evidence (including its provenance/lineage) that may be required to validate the representation faithfulness for the selected subset of data. In some embodiments, evidence may be in structured or semi-structured form, such as EDI message for PO, bank statements, and/or shipping information. Available evidence and its provenance may be recommended based on best practices, for example as indicated by one or more user inputs received by the system.

In some embodiments, a finance system may capture the final state of an agreement or a transaction. Tracing through the entire history started from the original agreement followed by subsequent (e.g., multiple) amendments may be required to fully substantiate the current state of the financial system.

Multiple multi-modal evidence may be required for substantiating a single entry in a finance system. As an example, a sales order in the financial system might require substantiating the unit price from sales contract and quantity from EDI message. As another example, email correspondence may be used to amend original purchase orders or contracts.

The system may collect of evidence (e.g., each evidence may include one or more fields) associated with entries in the financial system (e.g., with one or more fields) at the transaction level, where the association may be defined by a similarity metric between evidence and the data in the financial system. In some embodiments, collection and/or selection of evidence may be based on automatic processing by the system that may be performed on the basis of the identification, by the system, as to what pieces of evidence are needed to validate the financial data that has been selected for validation. In some embodiments, a user man specify which evidence should be collected.

One or more pieces of evidence may be represented by one or more feature vectors. One or more entries from the financial system may be represented by one or more feature vectors. Feature vectors may be generated and stored by the system based on applying one or more data processing and/or information extraction operations to the collected evidence and collected financial system entries.

In some embodiments, the system may represent one or more pieces of evidence, as a feature vector. The system may generate and store feature vectors based on documents or other data representing evidence that is received by the system. The system may be configured to generate one or more feature vectors to represent a subset of data within a financial system (such as an ERP system) within a specified time period (e.g., an accounting period) where validation of representation faithfulness through vouching and tracing between data the financial systems and various evidence is to be performed. The system may be configured to generate one or more feature vectors to represent one or more of a set of (potentially multi-modal) evidence (including its provenance/lineage) that may be used to validate the representation faithfulness.

In some embodiments, the system may be configured to encode contextual information into one or more feature vectors, thereby capturing context awareness. For example, feature vectors may be generated, at least in part, based on metadata, file names, and/or other contextual information when obtaining documents from a document repository or other data source. Feature vectors may also be generated, at least in part, based on computing from the content extracted from the evidence such as purchase order number, invoice number, payment journal ID, amount, and/or customer name. Feature vectors may also be generated, at least in part, based on computing from additional contextual information, whether endogenous and/or exogenous. Feature vector for a field level evidence may be (or include, or be based on) the value of the field itself.

In some embodiments, the system may compute a similarity metric that quantifies/scores the similarity between evidence (e.g., ingested documents) and the data in the financial system (e.g., FSLIs) to determine the association between a record in the financial system and the evidence. This may establish the potential one-to-one, one-to-many, many-to-one, and/or many-to-many relationships among evidence and data from the financial systems. Computing of similarity metrics may be based on a feature vector representing one or more of the pieces of evidence and/or based on a feature vector representing one or more pieces of data in the financial system. In some embodiments, the system may use one or more weights in the similarity metric calculation. Said weights may be prescribed by a user and/or may be trained using a machine learning model, e.g., with continuous learning based on the observed performance of the similarity metric. Computing the similarity metric between feature vector(s) representing the evidence and the data from financial systems may be conducted based on dynamic programming.

In some embodiments, the system may generate output data that indicates a level, quantification, classification, and/or extent of representation faithfulness, wherein the output may be generated based on the similarity metric. In some embodiments, the output may be based on selecting a subset of similarity metrics that indicate the highest level of similarity. In some embodiments, the output may be based on performing classification and/or clustering based on computed similarity metrics.

In some embodiments, the output may be generated in accordance with the following. The system may establish representation faithfulness based on ontological representation of the entries in the financial system to the individual fields within the entries. A measure of representation faithfulness may be based, at least in part, on confidence (based, e.g., on the similarity metric computed) that similarity exists between the evidence and data in the financial system. Sufficiency of the representation faithfulness at each level of an entry may be established either through explicit specification or implicit models. The association between evidence and transactions/entries in the financial systems may be one-to-one, one-to-many, many-to-one, or many-to-many. Representation faithfulness can be determined based on direct evidence, circumstantial evidence, or both.

In some embodiments, the systems and methods disclosed herein may be configured in accordance with the following dimensions for consideration in evidence matching:

- Data modality
  - Excel
  - Web forms
  - ERP
- Evidence Modality
  - OCR+Document Understanding
    - Scanned pdf,
    - Signature, Handwriting
  - Document Understanding
    - Email
    - Word, Excel
    - EDI
    - XBRL
- Evidence type
  - Invoice
  - PO
  - BoL, PoD
  - Contract/Lease
  - 8K/10K/10Q
  - Tax Returns
- Entity Extraction
  - Header vs. Line
  - PO #, Invoice #
  - Amount (Line, Total)
  - Date
  - Customer Name/Address
  - Product description/SKU
  - Delivery terms/Payment Terms
  - Quantity, Unit Price
  - Currency
- Normalization
  - Master data (customer master, product master)
  - Ontology (e.g. incoterm 2020)
  - ERP variation
  - Client variation
- Contextualization
  - Order to Cash
  - Procure to Pay
  - Record to Report
- Source of Context
  - Endogenous
  - Exogenous
- Matching Approach
  - Direct vs. Circumstantial
    - Precise
    - Fuzzy (with similarity score & confidence level)
    - Knapsack
    - Fuzzy Knapsack
  - Passive vs. Active
    - Passive approach compare data from document understanding and entity extraction
    - Active approach generate multiple alternative hypothesis to determine evidence match exists
- Type of Matching
  - 1:1
  - 1:Many (e.g. one payment allocated to multiple invoices)
  - Many:1 (e.g. multiple payment allocated to the same invoice)
  - Many: Many (e.g. multiple lines on SO reconcile to multiple lines on invoice or PO)
- Multiple Evidence
  - Rationalizing multiple matches made simultaneously with relative priority in contributing to the full vouching
- Versioning/Amendments
  - Change orders
  - Amendments

FIG. 5 depicts one example of leveraging multiple evidence in data integrity.

Example Embodiment

Information integrity is one of the five pillars of Information assurance (IA)—availability, integrity, authentication, confidentiality, and non-repudiation—and one of the three pillars of information security—availability, integrity, and confidentiality (frequently known as CIA triad). It is also foundational to the assurance of financial statements.

Information integrity may be defined as the representational faithfulness of the information to the underlying subject of that information and the fitness of the information for its intended use. Information can be structured (e.g., tabular data, accounting transactions), semi-structured (e.g., XML) or unstructured (e.g., text, images and video). Information consists of representations regarding one or more events and/or instances that have been created for a specified use. Such events or instances can have numerous attributes and characteristics that may or may not be included in a set of information, depending on the intended use of the information.

There are various risks associated with the design, creation and use of information as well as when performing an attestation engagement on its integrity. Four types of risks to information integrity have been suggested in the AICPA Whitepaper on Information Integrity (2013):

- 1. Subject matter risk is the risk that suitable criteria cannot be developed for the events or instances and information about the events or instances is inappropriate for the use for which it is intended—its fitness for purpose. It may include (1) The attributes of interest related to the event or instance or the environmental attributes and other meta-information may not be observable or measurable. (2) The information that can be supplied is misleading or is likely to be misunderstood by its intended recipient.
- 2. Use risk: is the risk that the information will be used for other than its intended purpose, used incorrectly, or not used when it should be. It includes (1) An intended user will make use of the information for purposes beyond its intended use or fail to use information for its intended uses resulting in erroneous decision-making or misunderstanding on the part of the user. (2) Someone other than the intended user will make use of the information resulting in a misunderstanding on the part of the user or an erroneous decision.
- 3. Information design risk: includes those risks of misstatement that arise from the failure of the information design to address subject matter and use risks, as well as the risks inherent in the activities that occur throughout the lifecycle of the information
- 4. Information processing lifecycle risk: includes of those risks that are introduced during the life cycle of particular pieces of information (1) Creation or identification of data (2) Measurement (3) Documentation or recording (4) Input (5) Processing, change or aggregation to transform data into information (6) Storage or archiving (7) Output or retrieval (8) Use (9) Destruction

All the risks discussed above show that the integrity of information depends on the integrity of the meta-information. These risks and their nature therefore may, in some embodiments, be considered when reporting on information.

Within the professional standards, opinions related to the integrity of information are arrived at by measuring or evaluating the information reported against suitable criteria. Since the criteria are closely related to the meta-information, it follows that the identification of criteria requires an analysis of the meta-information necessary to understand the subject matter. Information that contains complete meta-information would provide a greater array of possible criteria for evaluating information integrity or reporting on information integrity. For example, if the meta-information states that the information is prepared in accordance with generally accepted accounting principles, then this could be the criteria used for evaluating the information. In some embodiments, criteria must be suitable, which means they must be objective, measurable, complete and relevant. Accordingly, in some embodiments, the criteria must be identifiable and capable of consistent evaluation; consistent not only between periods but between entities or similar circumstances. In addition, it is important that the criteria can be subjected to procedures for gathering sufficient appropriate evidence to support the opinion or conclusion provided in the practitioner's report. Moreover, in some embodiments, metrics may be selected that address the risks that were identified.

Information integrity in the context of financial statement audit focuses on the representational faithfulness of the information used the financial statement audit. Financial statement audit includes the following categories, which is often referred to as Financial Statement Line Items (FSLI). A subset of these FSLIs are listed below:

1. Revenue & Account Receivables:

2. Expense and Account Payable

3. Journal Entries

4. Cash & Cash Equivalents

5. Inventory

6. Cost of Goods Sold

7. Prepaid & other Client Asset

8. PPE, Lease and Depreciation

9. Investment

10. Goodwill & Intangible

Each of these line items may require the line item information in the general ledger to be connected to the real world. As an example, account receivable will need to be connect to the invoice and purchase order, account payable will need to be connect to the purchase order, invoice and goods received, journal entries will need supporting documents, and inventory require direct observation of the warehouse.

This disclosure addresses the one of the major risk areas on information integrity—information processing cycle risk as it is recurrent for handling each financial report. Other risk areas of information integrity are out of scope for this disclosure.

Common techniques for establishing the representation faithfulness for most of the FSLIs are based on documents (paper or electronic) that captured the events taking place in the real world using Vouching & Tracing Techniques.

A vouching approach for PO based on sampling may involve the following steps:

- 1. Establish a sample collection of the transactions that need to be vouched within the transactions from ERP. The sampling could be a combination of:
  - Transactions that are most significant based on dollar amount
  - Transaction that might have highest risk or uncertainty due to the nature of the transaction
  - Stratified the transactions so that different sampling rate is applied to different band as higher sampling rate is applied to transactions with higher dollar amount or risk
  - Statistical sampling across the entire populations
- 2. Locate the documents in the document repository, assuming the document can be accessed based on the PO number (or equivalent unique identifier)
- 3. Validate the identification of transaction—which could be the Purchase Order number, potentially in conjunction with the date and revision to uniquely identify the appropriate version of PO when there are potential amendment and revision of the PO
- 4. Validate the customer name, addresses (for ship to and bill to), shipping terms, payment terms,
- 5. Validate individual lines in terms of quantities and unit price
- 6. Validate the total amount

Tracing, on the other hand, may follow the steps below:

- 1. Establish a sample collection of the documents that need to be traced within the document repository. The sampling could be a combination of
  - a. Statistical sampling of documents
- 2. Locate the transaction in the document repository, assuming the document can be accessed based on the PO number (or equivalent unique identifier)
- 3. Validate the identification of transaction—which could be the Purchase Order number, potentially in conjunction with the date and revision to uniquely identify the appropriate version of PO when there are potential amendment and revision of the PO
- 4. Validate the customer name, addresses (for ship to and bill to), shipping terms, payment terms,
- 5. Validate individual lines in terms of quantities and unit price
- 6. Validate the total amount

Evidence that may be used in establishing the representation faithfulness for financial statement line items include any one or more of:

- 11. Revenue & Account Receivables: may include contracts, purchase order (in pdf, word, excel, email), EDI messages (for PO, shipping, remittance advice from bank), Bill of Lading, packing list, packing slip, delivery confirmation, consignment agreements, payment details including cash receipts, bank statements, check image, remittance advice.
- 12. Expense and Account Payable: may include contracts, invoice, purchase order, EDI message, Bill of Lading, packing list, packing slip, delivery confirmation
- 13. Journal Entries: various forms of supporting documents including email, spreadsheets, word and pdf documents, receipts, contracts, etc.
- 14. Cash & Cash Equivalents: bank statements
- 15. Inventory: may include image and video of the warehouse and store shelves, packing list/packing slip, and return.
- 16. PPE, Lease and Depreciation: may include lease agreements, various documentation on income and expense associated with the specific PPE asset.

This example embodiment concerns representation faithfulness for data in the financial system related to revenue and receivable, namely, initial agreement (such as contracts & PO), evidence of fulfilling of obligation (such as shipping), and evidence of payment details (such as bank statements). Note that some of these evidences may come from one or more third parties—such as evidence coming directly from bank or shipping companies. Some of the evidence could be in the form of semi-structured (such as EDI or XML/EDI message).

The process for validating representation faithfulness may include one or more of the following steps:

Step One—Establish Subset of Data within the Financial System

The system may establish a subset of data within the financial systems (such as an ERP) within the specified accounting period where validation of representation faithfulness through vouching & tracing between data in financial systems and various evidence.

The accounting period could be the full year, a single quarter, a month, a week, or a single day (e.g., in the case of continuous control and monitoring).

The subset selection may be based on a combination of best practice, prior knowledge of the specific industry, and specific client. The data within the financial systems that are related to revenue and receivable FSLI audit and might require going through validation of faithful representation include sales order tables, sales invoice tables, and payment journal tables.

The window for conducting representation faithfulness validation might start slightly earlier and ends later than the accounting period based on the cutoff criteria as well as best practice as well as the industry and client specific knowledge.

Step Two—Establish Set of Evidence Required for Validation

The system may establish a set of (potentially multi-modal) evidence (including its provenance/lineage) that may be required to validate the representation faithfulness

Representation faithfulness for sales order is often based on purchase orders or contracts. A purchase order might be received in the form of a pdf, an email, an excel spreadsheet, a word document, an EDI message. Different part of sales order might be coming from different sources with different modalities. As an example, the sales order in automotive part manufacturing industry could have the price established by a sales contract and the quantity might be received through EDI message for just in time delivery. Alternatively, the pricing for commodity order could be based on a daily pricing table as opposed to based on agreement in advance.

Representation faithfulness for shipping confirmation could be based on Bill of Lading, proof of delivery, packing list, or packing slip. It can be in the form of EDI message or obtained from third-party service provider (such as shippo.com)

Representation faithfulness for payment journal could be based on various form of payment details, including check image, remittance advice, bank statement, daily ACH report, daily credit card settlement report, EDI message or obtained from third-party service provider (such as plaid.com)

Validation of faithful representation may include using both the content and metadata associated with the evidence. In particular, the provenance and lineage of the evidence can greatly facilitate the validation process (see, USP US20100114628A1; US 20100114629A1) Provenance or lineage for the evidence captures everything related from the moment that the evidence is created. It should keep track where the evidence sits, who and when it was accessed, and the operations and transformations that might have been applied. Representation faithfulness can be entirely conducted on the provenance alone if the provenance captures everything from the time the evidence was created to the moment it was loaded into the financial systems, and if the provenance/lineage can be demonstrated to be non-alterable and hence non-repudiation can be fully established.

Note that the finance system often captures the final state of an agreement or a transaction. On the other hand, the system may trace through the entire history of evolution of the evidence starting from the original agreement followed by subsequent (often multiple) amendments are required to fully substantiate the current state on the financial system.

Step Three—Collection of Evidence and Creation of Feature Vectors

The system may collect evidence (each evidence may include one or more fields) associated with entries in the financial system (with one or more fields) at the transaction level where the association is defined by a similarity metric between evidence and the data in the financial system:

Each evidence and entry from the financial systems may be represented by one or more feature vectors, for example as follows:

v=(v₁,v₂, . . . ,v_N)

Feature vectors may be extracted, derived, or computed from each evidence and the data from transaction in the finance system.

Feature vectors may be computed from the metadata, file names, or other contextual information when obtaining the documents from the repository of these documents. Feature vectors may include computing from the “content” extracted from the evidence such as purchase order #, invoice #, payment journal ID, amount, customer name. Feature vectors may also include computing from additional contextual information (both endogenous and exogenous) that might be pertinent. Feature vectors for a field level evidence could be the value of the field itself.

As an example, the feature vector for a pdf file SO0001238-PO.pdf could be (SO0001238, PO) indicating that this is supposed to be the PO associated with sales order #1238. However, the confirmation of the association will be dependent on additional verification of the content.

The feature vector can be defined based on the content of the documents, such as (PO #, Customer Name, Date, Total Amount).

Step Four—Compute Similarity Metric

The association between evidence and the entry within the financial system may be based on the computation of a similarity metric between the evidence and the entry in the financial system. A few potential similarity metrics may be defined as follows:

- Cosine similarity: The cosine similarity may be advantageous because even if the two similar vectors are far apart by the Euclidean distance (due to the size of the document), they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

$Cos θ = \frac{\vec{a} \cdot \vec{b}}{ \vec{a}   \vec{b} } = \frac{\sum_{1}^{n} a_{i} b_{i}}{\sqrt{\sum_{1}^{n} a_{i}^{2}} \sqrt{\sum_{1}^{n} b_{i}^{2}}}$

- Manhattan distance: The Manhattan distance is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinate and y-coordinates.
- Euclidean distance: The Euclidean distance between two points in either the

|x₁−x₂|+|y₁−y₂|

- plane or 3-dimensional space measures the length of a segment connecting the two points.

Weights used in the similarity metric can be prescribed or trained using machine learning model, potentially with continuous learning based on the observed performance of the similarity metric.

√{square root over ((x₂−x₁)²+(y₂−y₁)²)}.

Computing similarity between feature vector(s) representing the evidence and the data from financial systems may require making “soft” decisions in the process.

Using the example of vouching & tracing between sales order in the ERP and purchase order documents, an initial decision may be made on the most appropriate k purchase order documents that will be used for the matching (based on the top-k queries). Subsequently, matching may be performed within each purchase order document for purchase order number, customer, delivery & payment terms, and individual line items. Some of these items may need to be further explored—such as the customer and line items. The overall confidence score for each of the items may influence the overall rank of the evidence. As an example, top-2 documents that might be candidates for the entry in the financial system are doc_1 and doc_2 with similarity score (or confidence score) of c_11 and c_12. The subsequent evaluation of the combined confidence score for the next level evaluation is c_21 and c_22. The overall confidence for c_1 becomes c_11*c_21 and for c_2 becomes c_12*c_22 using the definition for fuzzyAND. Consequently, the relative rank between these two evidences as potential match to the entry in the financial system could change. This approach allows us to evaluate multiple potential evidence simultaneously without pruning them prematurely.

Additional approaches may be based on dynamic programming with backtracking. (See Li, C. S., Chang, Y. C., Smith, J. R., Bergman, L. D. and Castelli, V., 1999, December, Framework for efficient processing of content-based fuzzy Cartesian queries, Storage and Retrieval for Media Databases 2000 (Vol. 3972, pp. 64-75), International Society for Optics and Photonics; Natsev, A., Chang, Y. C., Smith, J. R., Li, C. S. and Vitter, J. S., 2001, August, Supporting incremental join queries on ranked inputs, VLDB (Vol. 1, pp. 281-290); U.S. Pat. No. 6,778,946 (algorithm for identifying combination).)

Step Five—Establish Representation Faithfulness

Representation faithfulness may be established by the system as follows. Representation faithfulness may be established based on the ontological representation of the entries in the financial system to the individual fields within the entries. A measure of representation faithfulness may be indicated by the confidence (e.g., based on the similarity metric computed) that agreement/similarity exists between the evidence and data in the financial system. Note that confidence scores when cascading two feature vectors can be obtained through fuzzy AND logic of the confidence level for each feature vector:

- fuzzyAND (x, y)=min (x, y)
- fuzzyAND (x,y)=x*y

Sufficiency of the representation faithfulness at each level of the entry may be established through explicit specification and/or implicit models. The association between evidence and transactions/entries in the financial systems may be one-to-one, one-to-many, many-to-one, and many-to-many. Note that the representation faithfulness can be based on direct evidence and/or circumstantial evidence

The level of matching and confidence level between two entities, whether it is numerical value, a string, or a date, could be computed as explained below.

Fuzzy matching for numeric values may be calculated as follows: A & B, each of them also has a confidence level:

A=(value_A,confidence_A)confidence_A is between [0,1]

B=(value_B,confidence_B)confidence_B is between [0,1]

Match score between A and B=max{1−|value_A−value_B|/max_diff,0}*100%

Note that max_diff is a parameter to be set in order to indicate the match score is 0.

Confidence=min{confidence_A,confidence_B}

Fuzzy matching for strings may be based on Leveshetein Distance. Three types of string mismatch wherever a character has been deleted or inserted may be as follows:

- insertion: co*t→coat
- deletion: coat→co*t
- substitution: coat→cost

Levenshtein Distance may be referred to as edit distance, and may count the minimum number of operations (edits) required to transform one string into the other. As an example, the Levenshtein distance between “kitten” and “sitting” is 3. A minimal edit script that transforms the former into the latter is:

$kitten ⟶ sitten (substitute “ s ” for “ k ”) sitten ⟶ sittin (substitute “ i ” for “ e ”) sittin ⟶ sitting (insert “ g ” at the end) Leverage Fuzzywuzzy open source python library Confidence = \min {confidence_A, confidence_B} Match_score = fuzz . ratio = \frac{distance ({st}_{1}, {st}_{2})}{\max (len ({st}_{1}), len ({st}_{2}))} \times 100$

Fuzzy matching for dates may be calculated as follows: Assuming the window for tolerance is M days before it is considered as fully mismatched, the system may do the following:

A=(date_A,confidence_A)confidence_A is between [0,1]

B=(date_B,confidence_B)confidence_B is between [0,1]

Match score between A and B=max{1−|date_A−date_B|/M,0}*100%

Confidence=min{confidence_A,confidence_B}

In the systems described herein, vouching and tracing may be carried out simultaneously. The systems herein may go through every journal entry (in G/L, AP, AR, or other areas) in the ERP, validate its supporting documents, and trace each source document to the corresponding entry in the ERP simultaneously. This simultaneous vouching and tracing may minimize the amount of I/O operations to be performed for the ERP system and the content management systems as opposed to doing this separately and requiring twice as many accesses of the underlying operations.

In some embodiments, any one or more of the data processing operations, cross-validation procedures, vouching procedures, and/or other methods/techniques depicted herein may be performed, in whole or in part, by one or more of the systems (and/or components/modules thereof) disclosed herein.

Computer

FIG. 5 illustrates an example of a computer, according to some embodiments. Computer 500 can be a component of a system for providing an AI-augmented auditing platform including techniques for providing AI-explainability for processing data through multiple layers. In some embodiments, computer 500 may execute any one or more of the methods described herein.

Computer 500 can be a host computer connected to a network. Computer 500 can be a client computer or a server. As shown in FIG. 5, computer 500 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server, or handheld computing device, such as a phone or tablet. The computer can include, for example, one or more of processor 510, input device 520, output device 530, storage 540, and communication device 560. Input device 520 and output device 530 can correspond to those described above and can either be connectable or integrated with the computer.

Input device 520 can be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice-recognition device. Output device 530 can be any suitable device that provides an output, such as a touch screen, monitor, printer, disk drive, or speaker.

Storage 540 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory, including a random access memory (RAM), cache, hard drive, CD-ROM drive, tape drive, or removable storage disk. Communication device 560 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly. Storage 540 can be a non-transitory computer-readable storage medium comprising one or more programs, which, when executed by one or more processors, such as processor 510, cause the one or more processors to execute methods described herein.

Software 550, which can be stored in storage 540 and executed by processor 510, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the systems, computers, servers, and/or devices as described above). In some embodiments, software 550 can include a combination of servers such as application servers and database servers.

Software 550 can also be stored and/or transported within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 540, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.

Software 550 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport-readable medium can include but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.

Computer 500 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.

Computer 500 can implement any operating system suitable for operating on the network. Software 550 can be written in any suitable programming language, such as C, C++, Java, or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.

Appendix A shows additional information regarding AI-augmented auditing platforms including techniques for applying a composable assurance integrity framework, in accordance with some embodiments.

Following is a list of enumerated embodiments.

- Embodiment 1. A system for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the system comprising one or more processors configured to cause the system to:
  - receive a first data set representing a plurality of statements;
  - receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and
  - apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.
- Embodiment 2. The system of embodiment 1, wherein the output data comprises an assessment of risk that one or more of the plurality of statements represents a material misstatement.
- Embodiment 3. The system of any one of embodiments 1-2, wherein applying the one or more integrity analysis models comprises applying one or more process integrity analysis models to generate output data indicating whether one or more process integrity criteria are satisfied.
- Embodiment 4. The system of embodiment 3, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more process integrity criteria regarding a predefined procedure are satisfied.
- Embodiment 5. The system of any one of embodiments 3-4, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more temporal process integrity criteria are satisfied.
- Embodiment 6. The system of any one of embodiments 3-5, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more internal-consistency process integrity criteria are satisfied.
- Embodiment 7. The system of any one of embodiments 1-6, wherein applying the one or more integrity analysis models comprises applying one or more data integrity analysis models to generate an assessment of fidelity of information represented by the first data set to information represented by the second data set.
- Embodiment 8. The system of embodiment 7, wherein applying the one or more data integrity analysis models is based on exogenous data in addition to the first data set and the second data set.
- Embodiment 9. The system of any one of embodiments 1-8, wherein applying the one or more integrity analysis models comprises applying one or more policy integrity models to generate output data comprising an adjudication according to one or more policy integrity criteria, wherein the adjudication is based all or part of one or both of: the plurality of statements and the corroborating evidence.
- Embodiment 10. The system of embodiment 9, wherein the adjudication rendered by the one or more policy integrity models is based on assurance a knowledge substrate including data representing one or more of the following: industry practice of an industry related to one or more of the plurality of statements, historical behavior related to one or more parties relevant to one or more of the plurality of statements, one or more accounting policies, and one or more auditing standards.
- Embodiment 11. The system of any one of embodiments 1-10, wherein the assessment of a risk is associated with a level selected from: a transaction-level, an account level, and a line-item level.
- Embodiment 12. The system of any one of embodiments 1-11, wherein generating the assessment of a risk is based at least in part on an assessed level of risk attributable to one or more automated processes used in generating or processing one or both of the first and second data sets.
- Embodiment 13. The system of any one of embodiments 1-12, wherein generating the assessment of risk comprises performing full-population testing on the first data set and the second data set.
- Embodiment 14. The system of any one of embodiments 1-13, wherein generating the assessment of risk comprises:
  - applying one or more process integrity models based on ERP data included in one or both of the first data set and the second data set; and
  - applying one or more data integrity models based on corroborating evidence in the second data set.
- Embodiment 15. The system of any one of embodiments 1-14, wherein the one or more processors are configured to apply the assessment of the risk in order to configure a characteristic of a target sampling process.
- Embodiment 16. The system of any one of embodiments 1-15, wherein the one or more processors are configured to apply one or more common modules across two of more models selected from: a data integrity model, a process integrity model, and a policy integrity model.
- Embodiment 17. The system of any one of embodiments 1-16, wherein the one or more processors are configured to apply an assurance insight model in order to generate, based at least in part on the generated assessment of risk of material misstatement, assurance insight data.
- Embodiment 18. The system of embodiment 17, wherein the one or more processors are configured to apply an assurance recommendation model to generate, based at least in part on the assurance insight data, recommendation data.
- Embodiment 19. The system of any one of embodiments 1-18, wherein the one or more processors are configured to:
  - receive a user input comprising instructions regarding a set of criteria to be applied; and
  - apply the one or more integrity analysis models in accordance with the received instruction regarding the set of criteria to be applied.
- Embodiment 20. The system of any one of embodiments 1-19, wherein applying the one or more integrity analysis models comprises:
  - applying a first set of the one or more integrity analysis models to generate first result data; and
  - in accordance with the first result data, determining whether to apply a second subset of the one or more integrity analysis models.
- Embodiment 21. A non-transitory computer-readable storage medium storing instructions for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
  - receive a first data set representing a plurality of statements;
  - receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and
  - apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.
- Embodiment 22. A method for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, wherein the method is performed by a system comprising one or more processors, the method comprising:
  - receiving a first data set representing a plurality of statements;
  - receiving a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and
  - applying one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.
- Embodiment 23. A system for generating an assessment of faithfulness of data, the system comprising one or more processors configured to cause the system to:
  - receive a first data set representing a plurality of statements;
    - receive a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements;
    - generate, for each of the plurality of statements, a respective statement feature vector;
    - generate, for each of the plurality of items of corroborating evidence, a respective evidence feature vector;
  - compute, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; and
  - generate, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.
- Embodiment 24. The system of embodiment 23, wherein generating the output data representing the assessment of faithfulness comprises performing a clustering operation on a set of similarity metrics including the similarity metric.
- Embodiment 25. The system of any one of embodiments 23-24, wherein generating the respective statement feature vectors comprises encoding one or more of the following: content information included in the first data set, contextual information included in the first data set; and information received from a data source distinct from the first data set.
- Embodiment 26. The system of any one of embodiments 23-25, wherein generating the respective evidence feature vectors comprises encoding one or more of the following: content information included in the second data set, contextual information included in the second data set; and information received from a data source distinct from the second data set.
- Embodiment 27. The system of any one of embodiments 23-26, wherein the first data set is selected based on one or more data selection criteria for selecting a subset of available data within a system, wherein the subset selection criteria comprise one or more of the following: a data content criteria and a temporal criteria.
- Embodiment 28. The system of any one of embodiments 23-27, wherein the second data set comprises data representing provenance of one or more of the items of corroborating evidence.
- Embodiment 29. The system of any one of embodiments 23-28, wherein the second data set comprises one or more of the following: structured data, semi-structured data, and unstructured data.
- Embodiment 30. The system of any one of embodiments 23-29, wherein the second data set comprises data representing multiple versions of a single document.
- Embodiment 31. The system of any one of embodiments 23-30, wherein generating the similarity metric comprises comparing a single one of the statement feature vectors to a plurality of the evidence feature vectors.
- Embodiment 32. The system of any one of embodiments 23-31, wherein generating the similarity metric comprises applying dynamic programming.
- Embodiment 33. The system of any one of embodiments 23-32, wherein generating the similarity metric comprises applying one or more weights, wherein the weights are determined in accordance with one or more machine learning models.
- Embodiment 34. The system of any one of embodiments 23-33, wherein generating the output data representing the assessment of faithfulness comprises generating a confidence score.
- Embodiment 35. The system of any one of embodiments 23-34, wherein generating the output data representing the assessment of faithfulness comprises assessing sufficiency of faithfulness at a plurality of levels.
- Embodiment 36. A non-transitory computer-readable storage medium storing instructions for generating an assessment of faithfulness of data, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
  - receive a first data set representing a plurality of statements;
  - receive a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements;
    - generate, for each of the plurality of statements, a respective statement feature vector;
    - generate, for each of the plurality of items of corroborating evidence, a respective evidence feature vector;
  - compute, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; and
  - generate, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.
- Embodiment 37. A method for generating an assessment of faithfulness of data, wherein the method is performed by a system comprising one or more processors, the method comprising:
  - receiving a first data set representing a plurality of statements;
    - receiving a second data set comprising a plurality of items of corroborating evidence related to one or more of the plurality of statements;
    - generating, for each of the plurality of statements, a respective statement feature vector;
    - generating, for each of the plurality of items of corroborating evidence, a respective evidence feature vector;
  - computing, based on one or more of the statement feature vectors and based on one or more of the evidence feature vectors, a similarity metric representing a level of similarity between a set of one or more of the plurality of statements and a set of one or more of the plurality of items of corroborating evidence; and
  - generating, based on the similarity metric, output data representing an assessment of faithfulness of the first data set.

This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE”, filed Jun. 30, 2022, Attorney Docket no. 13574-20068.00.

This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES, AND COLLECTABILITY”, filed Jun. 30, 2022, Attorney Docket no. 13574-20069.00.

This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED DOCUMENT PROCESSING”, filed Jun. 30, 2022, Attorney Docket no. 13574-20071.00.

This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR PROVIDING AI-EXPLAINABILITY FOR PROCESSING DATA THROUGH MULTIPLE LAYERS”, filed Jun. 30, 2022, Attorney Docket no. 13574-20072.00.

Claims

1. A system for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the system comprising one or more processors configured to cause the system to:

receive a first data set representing a plurality of statements;

receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and

apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.

2. The system of claim 1, wherein the output data comprises an assessment of risk that one or more of the plurality of statements represents a material misstatement.

3. The system of claim 1, wherein applying the one or more integrity analysis models comprises applying one or more process integrity analysis models to generate output data indicating whether one or more process integrity criteria are satisfied.

4. The system of claim 3, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more process integrity criteria regarding a predefined procedure are satisfied.

5. The system of claim 3, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more temporal process integrity criteria are satisfied.

6. The system of claim 3, wherein applying the one or more process integrity analysis models comprises determining whether the first set of data indicates that one or more internal-consistency process integrity criteria are satisfied.

7. The system of claim 1, wherein applying the one or more integrity analysis models comprises applying one or more data integrity analysis models to generate an assessment of fidelity of information represented by the first data set to information represented by the second data set.

8. The system of claim 7, wherein applying the one or more data integrity analysis models is based on exogenous data in addition to the first data set and the second data set.

9. The system of claim 1, wherein applying the one or more integrity analysis models comprises applying one or more policy integrity models to generate output data comprising an adjudication according to one or more policy integrity criteria, wherein the adjudication is based all or part of one or both of: the plurality of statements and the corroborating evidence.

10. The system of claim 9, wherein the adjudication rendered by the one or more policy integrity models is based on assurance a knowledge substrate including data representing one or more of the following: industry practice of an industry related to one or more of the plurality of statements, historical behavior related to one or more parties relevant to one or more of the plurality of statements, one or more accounting policies, and one or more auditing standards.

11. The system of claim 1, wherein the assessment of a risk is associated with a level selected from: a transaction-level, an account level, and a line-item level.

12. The system of claim 1, wherein generating the assessment of a risk is based at least in part on an assessed level of risk attributable to one or more automated processes used in generating or processing one or both of the first and second data sets.

13. The system of claim 1, wherein generating the assessment of risk comprises performing full-population testing on the first data set and the second data set.

14. The system of claim 1, wherein generating the assessment of risk comprises:

applying one or more process integrity models based on ERP data included in one or both of the first data set and the second data set; and

applying one or more data integrity models based on corroborating evidence in the second data set.

15. The system of claim 1, wherein the one or more processors are configured to apply the assessment of the risk in order to configure a characteristic of a target sampling process.

16. The system of claim 1, wherein the one or more processors are configured to apply one or more common modules across two of more models selected from: a data integrity model, a process integrity model, and a policy integrity model.

17. The system of claim 1, wherein the one or more processors are configured to apply an assurance insight model in order to generate, based at least in part on the generated assessment of risk of material misstatement, assurance insight data.

18. The system of claim 17, wherein the one or more processors are configured to apply an assurance recommendation model to generate, based at least in part on the assurance insight data, recommendation data.

19. The system of claim 1, wherein the one or more processors are configured to:

receive a user input comprising instructions regarding a set of criteria to be applied; and

apply the one or more integrity analysis models in accordance with the received instruction regarding the set of criteria to be applied.

20. The system of claim 1, wherein applying the one or more integrity analysis models comprises:

applying a first set of the one or more integrity analysis models to generate first result data; and

in accordance with the first result data, determining whether to apply a second subset of the one or more integrity analysis models.

21. A non-transitory computer-readable storage medium storing instructions for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, the instructions configured to be executed by a system comprising one or more processors to cause the system to:

receive a first data set representing a plurality of statements;

receive a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and

apply one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.

22. A method for generating risk assessments based on a data representing a plurality of statements and data representing corroborating evidence, wherein the method is performed by a system comprising one or more processors, the method comprising:

receiving a first data set representing a plurality of statements;

receiving a second data set comprising a corroborating evidence related to one or more of the plurality of statements; and

applying one or more integrity analysis models to the first data set and the second data set in order to generate output data comprising an assessment of risk.