SYSTEM FOR BACKING OUT DATA

Info

Publication number: 20170199903
Type: Application
Filed: Jan 12, 2016
Publication Date: Jul 13, 2017
Inventors: Rodney Shannon Floyd (Cumming, GA), Ron G. Rambo (West Coxsackie, NY), Nancy M. Cerniglia (Craryville, NY)
Application Number: 14/993,679

Abstract

Some aspects disclosed herein are directed to, for example, a system and method of backing out data. The method may comprise determining one or more unique identifiers for data to be loaded from a source system to one or more databases. The method may comprise loading the data from the source system to the one or more databases, and the data may be loaded with the one or more unique identifiers. A computing device may determine that a subset of the data loaded to the one or more databases comprises invalid data. In response to determining that the subset of the data loaded to the one or more databases comprises invalid data, the computing device may determine one or more unique identifiers for the invalid data. The invalid data may be removed, from the one or more databases, based on the one or more unique identifiers for the invalid data.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application is related to U.S. patent application Ser. No. 14/262,014, filed Apr. 25, 2014, and entitled DATA LOAD PROCESS, and U.S. patent application Ser. No. 14/950,609, filed Nov. 24, 2015, and entitled DATA LOAD SYSTEM WITH DISTRIBUTED DATA FACILITY TECHNOLOGY. The related applications are hereby incorporated by reference in their entirety.

TECHNICAL FIELD

One or more aspects of the disclosure generally relate to computing devices, computing systems, and computer software. In particular, one or more aspects of the disclosure generally relate to computing devices, computing systems, and computer software that may be used to back out or otherwise remove data loaded to one or more databases.

BACKGROUND

Data may be loaded to various locations, such as one or more databases. However, some of this data may turn out to be bad data (e.g., corrupted data, unsecured data, old data, and the like). Removing (e.g., backing out) the bad data may be complex, time consuming, and consume a significant amount of processing power, especially if a large amount of data is loaded.

SUMMARY

The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosure. The summary is not an extensive overview of the disclosure. It is neither intended to identify key or critical elements of the disclosure nor to delineate the scope of the disclosure. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the description below.

Some aspects as disclosed herein are directed to, for example, a system and method of backing out data. The method may comprise determining one or more unique identifiers for data to be loaded from a source system to one or more databases. The method may comprise loading the data from the source system to the one or more databases, and the data may be loaded with the one or more unique identifiers. A computing device may determine that a subset of the data loaded to the one or more databases comprises invalid data. In response to determining that the subset of the data loaded to the one or more databases comprises invalid data, the computing device may determine one or more unique identifiers for the invalid data. The invalid data may be removed, from the one or more databases, based on the one or more unique identifiers for the invalid data.

In some aspects, loading data may comprise loading a portion of the data from the source system to the one or more databases at a first time, and loading the portion of the data from the source system to the one or more databases at a second time. Furthermore, determining one or more unique identifiers may comprise determining first one or more unique identifiers for the portion of the data loaded at the first time and determining second one or more unique identifiers for the portion of the data loaded at the second time.

In some aspects, determining that the subset of the data loaded to the one or more databases comprises invalid data may comprise receiving, from a user device, an indication of the invalid data. The systems described herein may generate a user interface displayable by the user device. The user interface may comprise a data field for a user to provide the indication of the invalid data.

In some aspects, removing the invalid data may further be based on one or more load date for the invalid data. The one or more load date may comprise a time period, and removing the invalid data may comprise removing the invalid data that was loaded to the one or more databases during the time period.

After removing the invalid data, the method and system described herein may determine new data to load to the one or more database. The new data may correspond to the invalid data removed from one or more databases. The new data may be loaded to one or more databases.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

FIG. 1 illustrates an example operating environment in which various aspects of the disclosure may be implemented.

FIG. 2 illustrates another example operating environment in which various aspects of the disclosure may be implemented.

FIG. 3 illustrates an example operating environment for loading data and/or backing out data in which various aspects of the disclosure may be implemented.

FIG. 4 illustrates an example method for loading data and/or backing out data in which various aspects of the disclosure may be implemented.

FIG. 5 illustrates an example user interface for backing out data in which various aspects of the disclosure may be implemented.

DETAILED DESCRIPTION

In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown, by way of illustration, various embodiments in which the claimed subject matter may be practiced. It is to be understood that other embodiments may be utilized, and that structural and functional modifications may be made, without departing from the scope of the present claimed subject matter.

FIG. 1 illustrates an example block diagram of a computing device 101 (e.g., a computer server, desktop computer, laptop computer, tablet computer, other mobile devices, and the like) in an example computing environment 100 that may be used according to one or more illustrative embodiments of the disclosure. The computing device 101 may have a processor 103 for controlling overall operation of the server and its associated components, including for example random access memory (RAM) 105, read-only memory (ROM) 107, input/output (I/O) module 109, and memory 115.

I/O module 109 may include, e.g., a microphone, mouse, keypad, touch screen, scanner, optical reader, and/or stylus (or other input device(s)) through which a user of computing device 101 may provide input, and may also include one or more of a speaker for providing audio output and a video display device for providing textual, audiovisual, and/or graphical output. Software may be stored within memory 115 and/or other storage to provide instructions to processor 103 for enabling computing device 101 to perform various functions. For example, memory 115 may store software used by the computing device 101, such as an operating system 117, application programs 119, and an associated database 121. Additionally or alternatively, some or all of the computer executable instructions for computing device 101 may be embodied in hardware or firmware (not shown).

The computing device 101 may operate in a networked environment supporting connections to one or more remote computers, such as terminals 141 and 151. The terminals 141 and 151 may be personal computers or servers that include any or all of the elements described above with respect to the computing device 101. The network connections depicted in FIG. 1 include a local area network (LAN) 125 and a wide area network (WAN) 129, but may also include other networks. When used in a LAN networking environment, the computing device 101 may be connected to the LAN 125 through a network interface or adapter 123. When used in a WAN networking environment, the computing device 101 may include a modem 127 or other network interface for establishing communications over the WAN 129, such as the Internet 131. It will be appreciated that the network connections shown are illustrative and other means of establishing a communications link between the computers may be used. The existence of any of various well-known protocols such as TCP/IP, Ethernet, FTP, HTTP, HTTPS, and the like is presumed. Computing device 101 and/or terminals 141 or 151 may also be mobile terminals (e.g., mobile phones, smartphones, PDAs, notebooks, tablets, and the like) including various other components, such as a battery, speaker, and antennas (not shown).

FIG. 2 illustrates another example operating environment in which various aspects of the disclosure may be implemented. An illustrative system 200 for implementing methods according to the present disclosure is shown. As illustrated, system 200 may include one or more workstations 201. The workstations 201 may be used by, for example, agents or other employees of an institution (e.g., a financial institution) and/or customers of the institution. Workstations 201 may be local or remote, and are connected by one or more communications links 202 to computer network 203 that is linked via communications links 205 to server 204. In system 200, server 204 may be any suitable server, processor, computer, or data processing device, or combination of the same.

Computer network 203 may be any suitable computer network including the Internet, an intranet, a wide-area network (WAN), a local-area network (LAN), a wireless network, a digital subscriber line (DSL) network, a frame relay network, an asynchronous transfer mode (ATM) network, a virtual private network (VPN), or any combination of any of the same. Communications links 202 and 205 may be any communications links suitable for communicating between workstations 201 and server 204, such as network links, dial-up links, wireless links, hard-wired links, and the like.

FIG. 3 illustrates an example operating environment 300 for loading data and/or backing out data in which various aspects of the disclosure may be implemented. The operating environment 300 may include a plurality of computing devices, such as a source system 305, a landing zone 310, a data load system or module 315, database(s) 320, control table(s) 325, a data back out system or module 330, and user device(s) 345. Each of the computing devices may incorporate one or more of the components described above with reference to FIG. 1 and FIG. 2 (e.g., processors, RAM, ROM, memory, communication interfaces, and the like). The computing devices may communicate with one another and process data to load or back out data, as will be described in further detail below. One or more users 340 (e.g., administrators) may initiate and/or provide input for data load or back out. Other examples of computing devices that may be used in the operating environment 300 are provided in U.S. patent application Ser. No. 14/950,609, filed Nov. 24, 2015, and entitled DATA LOAD SYSTEM WITH DISTRIBUTED DATA FACILITY TECHNOLOGY, which is herein incorporated by reference in its entirety.

The operating environment 300 may include a source system 305. The source system 305 may include (e.g., store) the data (e.g., source files or other items) to be delivered to the database(s) 320 or other destinations. The source system 305 may comprise one or more servers having a database or flat file system for storing data. In some aspects, the source system 305 may send a request to the data load system or module 315 to send data to the database(s) 320.

The operating environment 300 may include a landing zone 310. The landing zone 310 may comprise, for example, a holding repository or other temporary storage location for the data to be delivered to the database(s) 320. The landing zone 310 may be, for example, in a computing device in a network, server or cloud and/or may comprise, for example, physical memory within the data load system 315.

The operating environment 300 may include a data load system or module 315. The data load system or module 315 may extract or otherwise receive the data from the landing zone 310 (or the source system 305). The data load system 315 may also validate and/or transform the data from the source system 305 to formats compatible with one or more of the database(s) 320.

The operating environment 300 may include one or more database(s) 320. The database(s) 320 may comprise destinations or other storage locations for the data from the source system 305. In some aspects, data from the source system 305 may be sent to the database(s) 320 via the landing zone 310 and/or the data load system or module 315. Aspects of loading data from the source system 305 to the database(s) 320 are described in more detail in U.S. patent application Ser. No. 14/950,609, filed Nov. 24, 2015, and entitled DATA LOAD SYSTEM WITH DISTRIBUTED DATA FACILITY TECHNOLOGY, which is herein incorporated by reference in its entirety.

The operating environment 300 may include one or more control table(s) 325. The data load system 315 may store information, such as metadata, associated with the loaded data and/or the process for loading the data in the control table(s) 325. The information stored in the control table(s) 325 may include unique identifier(s) for the loaded data, source code for the loaded data, an identifier of the user that sent the file, the name of the file, a start time stamp for the file (e.g., when data load began), an end time stamp for the file (e.g., when data load ended), and other data (e.g., metadata) describing the loaded data. As will be described in further detail below, a computing device may use the control table(s) 325 to search for bad data when the bad data is to be removed from the database(s) 320. The information for the control table(s) 325 may be generated and/or stored in the control table(s) 325 before, during, or after the data is loaded to the database(s) 320.

The operating environment 300 may include a data back out system or module 330. The back out system or module 330 may be configured to back out bad data loaded to the database(s) 320. As will be described in further detail below, the back out system or module 330 may communicate with and access information from the control table(s) 325 to identify the data to back out and the storage locations of the data to back out.

The operating environment 300 may include one or more user device(s) 345, such as a laptop computer, workstation, mobile device, and the like. As will be described in further detail below, the user device(s) 345 may display one or more user interfaces to user(s) 340, and the user interfaces may be used to initiate a data load back out process.

FIG. 4 illustrates an example method for loading data and/or backing out data in which various aspects of the disclosure may be implemented. The steps illustrated in FIG. 4 may be performed by one or more of the computing devices previously described.

In step 405, a computing device (e.g., the data load module 315 and/or the data back out module 330) may determine a unique identifier (e.g., metadata) for the data (e.g., data to be uploaded or data already uploaded). The data may be identified by file name, source code, and the like. The computing device may assign (e.g., link) the identifier to the data. The identifier may be unique to a date. As will be described in further detail below, the computing device may use the unique identifier and/or other information (e.g., a file name, a name of a program, a source type, and the like) to track backwards to identify data to be backed out.

When a file is being loaded or has already been loaded, the computing device may determine the unique identifier for the file to identify how the file is being or was loaded. The unique identifier may comprise a unique number (e.g., 1234), a unique string of letters (e.g., ABDFGZ), or a combination of numbers and letters and/or other characters. In some aspects, the computing device may generate the unique identifier based on various pieces of information related to the data, such as the source database (e.g., where the data comes from), the program that generated the data, the destination of the data being loaded, information from the source code (e.g., metadata), and the like.

In some aspects, the computing device may determine and assign a unique identifier to data for each date an event occurs for the file, such as the date that the file is uploaded, the date that the file is reviewed or verified, and the like. Accordingly, uploaded data may be assigned two or more unique identifiers if the data is uploaded (or some other event occurs) two or more times. For example, the computing device may assign the identifiers 1234 and 1235 to the same uploaded data (e.g., duplicative or semi-duplicative data, such as a record) to indicate the two dates that the data has been uploaded. Generally, the same record may be assigned multiple identifiers if something in the record is different for each event, such as if there are two different upload dates for the data, there are two different versions of the data, and the like. In some aspects, one of the unique identifiers for a particular piece of data, such as the first unique identifier assigned by the computing device, may be the primary identifier (e.g., key), and the remaining unique identifiers may be secondary or lower tier identifiers.

In step 410, a computing device (e.g., the data load module 315) may load the data to its location (e.g., destination, such as an operational database, a storage database, or other database) if the data has not already been loaded. Loading data is described in detail in U.S. patent application Ser. No. 14/950,609, filed Nov. 24, 2015, and entitled DATA LOAD SYSTEM WITH DISTRIBUTED DATA FACILITY TECHNOLOGY, which is hereby incorporated by reference in its entirety.

Each data item may be loaded to a table or a plurality of tables. The loading process may be complex, and, for example, 40,000 or more elements may be inserted into data tables. One or more loader may be used to load the data. In some aspects, the loader code might be different from the code used to back out code, and the back out process will be described in further detail below. Furthermore, the loader code might not communicate with the data back out code.

The data may be loaded with (e.g., in association with) the one or more unique identifier for the data determined by the computing device. The unique identifier(s) for the data may be attached to one or more data tables that the data is loaded to. For example, if a particular file or source code is loaded to fifteen different locations, the unique identified tied to the particular file or source code may be attached to the fifteen different locations. Sometimes the destinations may be loaded with bad data. As will be described in further detail below, a data back out process using the unique identifiers may be used to remove the bad data.

In step 415, a computing device (e.g., the data back out module 330) may determine whether bad data exists. Bad data may comprise, for example, corrupt data, incorrect data, unstable data, bad images, duplicate data, unsecure data, or something else rendering the data invalid. For example, a user may have uploaded data that pertains to one person to a data location associated with a different person. The user may also identify bad data as data compromised in a data breach.

In some aspects, a user (e.g., user 340) may interact with a user interface displayed on a user device (e.g., user device 345) to select or otherwise identify the bad data. For example, the user may input a file name. The computing device may determine the data or file corresponding to the file name. The computing device may also determine the one or more unique identifiers corresponding to the file name.

FIG. 5 illustrates an example user interface 500 for backing out data in which various aspects of the disclosure may be implemented. The user interface 500 may comprise a web interface (e.g., a data management interface) and may be written using HTML code, script code, or in any other format. The user interface 500 may be used by a user to interact with the data back out system described herein. The user interface 500 may include a data field 502 for the user to select or otherwise identify the bad data. The user may enter a specific file name in the data field 502B or use the data field 502B to search for the desired data. The data field 502A may display a list of files, such as a dropdown menu, and the user may select the data to back out (e.g., by selecting a checkbox associated with the one or more data to be backed out).

If the user selects a data file comprising several items, the computing device may determine each item in the file and indicate (e.g., display) to the user each item in the file that is to be backed out. The user interface 500 may also include a data field 504 for the user to input a date or time period (e.g., the date the data was loaded on or a time period that the data was loaded within) and a data field 506 for the user to input the source type for the data (e.g., a data field 506A for selecting a source type and/or a data field 506B for entering or searching for a source type).

Once the data and/or files have been identified by the user, the user may select an option 508 (e.g., a submit button or confirm button) to send a request to the system to back out the bad data.

In step 420, a computing device (e.g., the data back out module 330) may determine (e.g., identify) the bad data (if it exists) based on the one or more unique identifiers associated with the bad data. That is, the computing device may use the unique identifiers to determine the items to remove from the system. The computing device may also use, along with the identifier(s), the date (e.g., load date) and/or the source type for the data.

A control table (e.g., control table 325) may be used by the computing device to determine which locations to search for the bad data. As previously described, the control table may include, for example, the unique identifier(s), the source code for the data, an identifier of the user that sent the file, the name of the file, a start time stamp for the file (e.g., when data load began), an end time stamp for the file (e.g., when data load ended), and other data (e.g., metadata) describing the bad data. By using the control table, the computing device might not need to search every data location for the bad data.

In some aspects, the computing device may search for bad data in a particular time period. For example, the computing device may search for bad data within, e.g., the last seven days, using the unique identifier(s) for the bad data corresponding to data loads that have occurred within the last 7 days. As previously explained, data may be loaded twice (or more times), and a unique identifier may be assigned to the data for each load. Other time periods may be determined by the computing device or specified by the user, as previously described.

In step 425, the computing device (e.g., the data back out module 330) may determine how the bad data was loaded. For example, the computing device may determine the process steps a particular application performed to load or otherwise input the bad data. The computing device may also determine which items within a file were loaded, such as all of the items within the file or a subset of all of the items. The computing device may also determine when (e.g., a date or time period) the bad data was loaded.

In step 430, the computing device may determine data related to the bad data. For example, the computing device may find all records associated with the bad data. In some aspects, the data related to the bad data may be time-related to the bad data. For example, the computing device may determine the data loaded to the databases within a specific time period. Data related to the bad data may be determined based on unique identifiers. For example, unique identifiers for one piece of loaded data may be associated or otherwise identified with unique identifiers for another piece of loaded data. The computing device may use the association of identifiers to determine the data related to the bad data. Any database table that holds relevant records may be used to identify data related to the bad data, such as the history of jobs, sources, and/or unique identifiers. In some aspects, a list of data potentially related to the bad data may be displayed to a user, and the user may select the related data to remove.

In step 435, the computing device may determine the location(s) of the bad data and/or the locations of data related to the bad data. Data locations may be within one or more databases. For example, data locations may be within a mid-range system that processed the bad data or file and/or one or more database that stores the bad data. Pointers or other unique identifiers may be used to identify these data locations. The data locations may be stored in one or more control tables, as previously described.

In step 440, the computing device (e.g., the data back out module 330) may remove (e.g., back out) the bad data and/or the data related to the bad data from the identified locations. As previously explained, the back out process may be used by the system to automatically clean up records in an operational database and/or a storage database. This may include the mid-range system that processed the data and the database(s) that store the bad data. The automated back out process may result in a significantly quicker data back out than with previous systems. For example and in some embodiments, the data may be backed out in 10 to 15 seconds. Previous systems, on the other hand, may have taken 4 to 8 hours to back out bad data.

The computing device may use distributed data facility (DDF) threads to back out data, for example, for mainframe database and DB2 applications. Connections or threads (e.g., DDF threads) may be generated for backing out data from one or more databases, such as a mainframe database. An exemplary DDF connection or thread is a database access connection or thread. DDF connections may be used to quickly remove large chunks of data, such as XML data, from databases, such as relational database management systems (RDBMS) because multiple connections or threads (e.g., thousands) may be created and used to simultaneously (e.g., in parallel) remove data from the databases. Use of DDF threads is described in more detail in U.S. patent application Ser. No. 14/950,609, filed Nov. 24, 2015, and entitled DATA LOAD SYSTEM WITH DISTRIBUTED DATA FACILITY TECHNOLOGY, which is herein incorporated by reference in its entirety.

In step 445, the computing device (e.g., the data back out module 330) may verify the data back out, such as to confirm that all of the data requested to be backed out has been backed out. Verification may be based on one or more checks. For example, the computing device may compare the total number of records (or unique pieces of data) backed out to the total number of records requested to be backed out. The computing device may also use the item sequence number for each backed out item for verification. The item sequence number may comprise one unique column of relevant information for the backed out data. The verification process may depend on the type of database that the data was loaded to.

In step 450, a computing device (e.g., the data load module 315 and/or the data back out module 330) may determine new data to load. For example, the new data may replace the data backed out by the computing device. The computing device may determine the previous location(s) of the bad data, and the new data may be loaded to those same locations. In some aspects, the new data may be a more recent version of the data or data that has otherwise been fixed (e.g., updated) relative to the bad data.

In step 455, a computing device (e.g., the data load module 315 and/or the data back out module 330) may determine one or more unique identifier(s) for the new data to be loaded. Step 455 may be similar to step 405 previously described.

In step 460, a computing device (e.g., the data load module 315) may load the new data to the appropriate data location(s). Step 460 may be similar to step 410 previously described. Accordingly, a new file may be sent and processed with new corrected information.

Various aspects described herein may be embodied as a method, an apparatus, or as computer-executable instructions stored on one or more non-transitory and/or tangible computer-readable media. Accordingly, those aspects may take the form of an entirely hardware embodiment, an entirely software embodiment (which may or may not include firmware) stored on one or more non-transitory and/or tangible computer-readable media, or an embodiment combining software and hardware aspects. Any and/or all of the method steps described herein may be embodied in computer-executable instructions stored on a computer-readable medium, such as a non-transitory and/or tangible computer readable medium and/or a computer readable storage medium. In addition, various signals representing data or events as described herein may be transferred between a source and a destination in the form of light and/or electromagnetic waves traveling through signal-conducting media such as metal wires, optical fibers, and/or wireless transmission media (e.g., air and/or space).

Aspects of the disclosure have been described in terms of illustrative embodiments thereof. Numerous other embodiments, modifications, and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure. For example, one of ordinary skill in the art will appreciate that the steps illustrated in the illustrative figures may be performed in other than the recited order, and that one or more steps illustrated may be optional in accordance with aspects of the disclosure.

Claims

1. A method comprising:

determining one or more unique identifiers for data to be loaded from a source system to one or more databases;

loading the data from the source system to the one or more databases, wherein the data is loaded with the one or more unique identifiers;

determining, by a computing device, that a subset of the data loaded to the one or more databases comprises invalid data;

in response to determining that the subset of the data loaded to the one or more databases comprises invalid data, determining, by the computing device, one or more unique identifiers for the invalid data; and

removing, from the one or more databases, the invalid data based on the one or more unique identifiers for the invalid data.

2. The method of claim 1, wherein:

loading the data comprises loading a portion of the data from the source system to the one or more databases at a first time, and loading the portion of the data from the source system to the one or more databases at a second time, and

determining the one or more unique identifiers comprises determining first one or more unique identifiers for the portion of the data loaded at the first time and determining second one or more unique identifiers for the portion of the data loaded at the second time.

3. The method of claim 1, wherein determining that the subset of the data loaded to the one or more databases comprises invalid data comprises receiving, from a user device, an indication of the invalid data.

4. The method of claim 3, further comprising:

generating a user interface displayable by the user device, wherein the user interface comprises a data field for a user to provide the indication of the invalid data.

5. The method of claim 1, wherein removing the invalid data is further based on one or more load date for the invalid data.

6. The method of claim 5, wherein the one or more load date comprises a time period, and wherein removing the invalid data comprises removing the invalid data that was loaded to the one or more databases during the time period.

7. The method of claim 1, further comprising:

after removing the invalid data, determining new data to load to the one or more database, wherein the new data corresponds to the invalid data removed from the one or more databases; and

loading the new data to the one or more databases.

8. An apparatus, comprising:

a processor; and

memory storing computer-executable instructions that, when executed by the processor, cause the apparatus to: determine one or more unique identifiers for data to be loaded from a source system to one or more databases; load the data from the source system to the one or more databases, wherein the data is loaded with the one or more unique identifiers; determine that a subset of the data loaded to the one or more databases comprises invalid data; in response to determining that the subset of the data loaded to the one or more databases comprises invalid data, determine one or more unique identifiers for the invalid data; and remove, from the one or more databases, the invalid data based on the one or more unique identifiers for the invalid data.

9. The apparatus of claim 8, wherein:

loading the data comprises loading a portion of the data from the source system to the one or more databases at a first time, and loading the portion of the data from the source system to the one or more databases at a second time, and

determining the one or more unique identifiers comprises determining first one or more unique identifiers for the portion of the data loaded at the first time and determining second one or more unique identifiers for the portion of the data loaded at the second time.

10. The apparatus of claim 8, wherein determining that the subset of the data loaded to the one or more databases comprises invalid data comprises receiving, from a user device, an indication of the invalid data.

11. The apparatus of claim 10, wherein the memory stores additional computer-executable instructions that, when executed by the processor, cause the apparatus to:

generate a user interface displayable by the user device, wherein the user interface comprises a data field for a user to provide the indication of the invalid data.

12. The apparatus of claim 8, wherein removing the invalid data is further based on one or more load date for the invalid data.

13. The apparatus of claim 12, wherein the one or more load date comprises a time period, and wherein removing the invalid data comprises removing the invalid data that was loaded to the one or more databases during the time period.

14. The apparatus of claim 8, wherein the memory stores additional computer-executable instructions that, when executed by the processor, cause the apparatus to:

after removing the invalid data, determine new data to load to the one or more database, wherein the new data corresponds to the invalid data removed from the one or more databases; and

load the new data to the one or more databases.

15. One or more non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more computing devices, cause the one or more computing devices to:

determine one or more unique identifiers for data to be loaded from a source system to one or more databases;

load the data from the source system to the one or more databases, wherein the data is loaded with the one or more unique identifiers;

determine that a subset of the data loaded to the one or more databases comprises invalid data;

in response to determining that the subset of the data loaded to the one or more databases comprises invalid data, determine one or more unique identifiers for the invalid data; and

remove, from the one or more databases, the invalid data based on the one or more unique identifiers for the invalid data.

16. The one or more non-transitory computer-readable medium of claim 15, wherein:

loading the data comprises loading a portion of the data from the source system to the one or more databases at a first time, and loading the portion of the data from the source system to the one or more databases at a second time, and

determining the one or more unique identifiers comprises determining first one or more unique identifiers for the portion of the data loaded at the first time and determining second one or more unique identifiers for the portion of the data loaded at the second time.

17. The one or more non-transitory computer-readable medium of claim 15, wherein determining that the subset of the data loaded to the one or more databases comprises invalid data comprises receiving, from a user device, an indication of the invalid data.

18. The one or more non-transitory computer-readable medium of claim 17, having computer-readable instructions stored thereon that, when executed by the one or more computing devices, cause the one or more computing devices to:

generate a user interface displayable by the user device, wherein the user interface comprises a data field for a user to provide the indication of the invalid data.

19. The one or more non-transitory computer-readable medium of claim 17, wherein removing the invalid data is further based on one or more load date for the invalid data.

20. The one or more non-transitory computer-readable medium of claim 19, wherein the one or more load date comprises a time period, and wherein removing the invalid data comprises removing the invalid data that was loaded to the one or more databases during the time period.