Abstract: Certain example embodiments relate to dashboards that help streamline and automate data quality management processes used with machine learning (ML) models and ML-enabled technology. A clean dataset is initialized from a dirty dataset. A search space is the set of all possible combinations of available error detection algorithms and data repair algorithms. A scoring function measures performance of a given error detection algorithm and data repair algorithm combination on the clean dataset. An ML model is trained using the clean dataset. Best error detection and data repair algorithms are selected, based on an optimization on the set of all possible combinations, and the defined scoring function. The selected best error detection algorithm is applied to the clean dataset, and a repaired dataset is generated using the selected best repair algorithm. The clean dataset is set to the repaired dataset. This procedure is repeated until a condition is met.
Type:
Grant
Filed:
May 8, 2024
Date of Patent:
February 24, 2026
Assignee:
SOFTWARE GMBH
Inventors:
Mohamed Abdelaal, Samuel Lokadjaja, Arne Kreuz
Abstract: Certain example embodiments relate to techniques for installing and/or updating an application instance on a target mainframe. Packaged code, received by the target mainframe from a source mainframe in a deployment package, is executable on the target mainframe. The deployment package also includes a memory dump of control blocks of a source application running on the source mainframe. The control blocks constitute each program of the source application. The source application corresponds to the application instance to be installed/updated. The packaged code, when executed, is programmed to cause the target mainframe to: identify an area of memory of the target mainframe into which the control blocks from the memory dump are to be loaded; load the control blocks from the memory dump into the identified area of memory; and cause the target mainframe to branch to a first address in the identified area of the memory, in the installing/updating.
Abstract: Certain example embodiments relate to meta-learning based error detection. Base classifiers are provided for historical attributes in historical datasets. Each is trained to indicate dirtiness of a value for the associated historical attribute. Clusters and a clustering model are generated using historical clustering features determined for each historical attribute, which are then associated with the clusters. For each dirty attribute in a dirty dataset, corresponding dirty clustering features are determined. The dirty attributes are assigned to the clusters using the corresponding determined dirty clustering features and the clustering model. The base classifiers associated with the clusters to which the dirty attributes were assigned are retrieved. Dirty features are extracted from the dirty dataset, and selectively modified. The extracted dirty features are applied to the retrieved the base classifiers to determine meta-features. A meta-classifier is trained using labeled meta-features.
Abstract: A reinforcement learning based approach is used for data cleaning operations that are used in data preparation operations where machine learning (ML) technology is implemented. Features are extracted from a dirty dataset. A batch is sampled from the dirty dataset. A set of one or more repair tools is selected from available repair tools, provided that the sampled batch is determined to include at least one error. The sampled batch is repaired using the selected set of repair tools. The ML model is trained based on the repaired sampled batch. A feedback metric is calculated based on performance of the trained ML model in connection with a validation dataset. The trained ML model is adjusted based on the calculated feedback metric (a loss). The approach is repeated such that the selection of the set of repair tools is modified based on the calculated feedback metric (a reward).