Abstract: Various embodiments comprise systems and methods to maintain data consistency in a data pipeline. In some examples, a computing system comprises data monitoring circuitry that monitors the operations of the data pipeline. The data pipeline receives input data, processes the input data, and generates output data. The data monitoring circuitry receives and processes the output data sets to identify changes between the output data sets. The data monitoring circuitry generates a consistency score based on the changes that indicates a similarity level between the output data sets. The data monitoring circuitry determines when the consistency score exceeds a threshold value. When the consistency score exceeds the threshold value, the data monitoring circuitry generates and transfers an alert that indicates ones of the output data sets that exceeded the threshold value.
Abstract: Various embodiments comprise systems and methods to indicate when errors occur in a data pipeline. In some examples, data monitoring circuitry monitors the operations of a data pipeline. The data monitoring circuitry ingests an output data set generated by the pipeline, compares the output data set to an expected output, identifies differences between the output data set and the expected output, and determines when the magnitude of the difference exceeds an error threshold. When the error threshold is exceeded, the data monitoring circuitry generates a graphical representation of the output data set, a graphical representation of the expected pipeline output, and an animated transition from the graphical representation of the expected pipeline output to the graphical representation of the output data.
Abstract: Various embodiments include a data monitoring system that monitors the operations of a data pipeline. The data monitoring system receives a call from the data pipeline to ingest unprocessed data. The data monitoring system generates metadata based on the unprocessed data and responsively computes expected data outputs. The data monitoring system receives a call from the data pipeline to ingest processed data that comprises actual data outputs generated by the data pipeline. The data monitoring system generates output metadata based on the processed data. The data monitoring system compares the metadata for the expected data outputs with the output metadata for the actual data outputs and determines when the expected data outputs do not align with the actual data outputs. When the expected data outputs do not align with the actual data outputs, the data monitoring system generates and transfers an alert signifying the non-alignment.
Abstract: Systems and methods for data quality monitoring are provided. Various embodiments include a data monitoring system that integrates into a data pipeline. The data monitoring system may receive a call from the data pipeline to analyze data inputs entering the data pipeline. The monitoring system can generate metadata describing the data inputs and compare the generated metadata with previously generated metadata to determine if the data inputs are historically consistent. The data monitoring system may return a consistency measure to the data pipeline. In further embodiments, the data monitoring system can generate metadata describing data outputs from the data pipeline and compare the output metadata to previously generated output metadata. In further embodiments, the data monitoring system may operate as a read only entity in a database. The monitoring system may monitor for changes in the database and determine when adverse changes occur in the database.