METHOD FOR SERVICE ORIENTED DATA EXTRACTION TRANSFORMATION AND LOAD
The present invention relates to a method for the configurable real time transformation of dissimilar data sources, the method further consisting of the steps of acquiring real time information pertaining to at least one data source, wherein the information comprises reference information that is associated with the data source, data transformation specification information that is associated with the data source, and scheduled event specification information that is associated with the data source, and maintaining the data source information. The method further comprises the steps of acquiring data from the data source in accordance with a specified scheduled event, converting the acquired data into a predetermined standardized format, performing at least one data transformation function from the real time stream upon the converted data in accordance with the acquired data transformation specification information that was associated with the data source; and transmitting the transformed data to a destination data source.
Latest IBM Patents:
- SENSITIVE STORED PROCEDURE IDENTIFICATION IN REAL-TIME AND WITHOUT DATA EXPOSURE
- Perform edge processing by selecting edge devices based on security levels
- Compliance mechanisms in blockchain networks
- Clustered rigid wafer test probe
- Identifying a finding in a dataset using a machine learning model ensemble
1. Field of the Invention
This invention relates to methodologies for extracting data from data sources on a network and particularly to, methodologies for service oriented data extraction and data transformation.
2. Description of Background
Before our invention, large business enterprises typically implemented a plurality of dissimilar data sources within their operational networks, in addition to interacting on a daily basis with a wide variety of external business sources (such as business transactions or structured data acquisition processes). The sharing, acquisition, transformation, and migration of managed data comprises significant costs that are associated with such activities. Incorporating new data feeds, or enabling new business transactions is usually a costly and lengthy process. Once a business enterprise has decided on a specific product, or product line for data management, it is often very difficult to migrate to a simpler, better, or more cost effective solution because of configuration differences between the existing and proposed data schemas.
Automated tools that are utilized for managing a business enterprise's data efficiently are a necessity in today's business environment. One classification of tools that is frequently used are stand-alone proprietary data transformation and schema mapping applications. Such tools are used to provide a configurable data transformation processes for data migration, sharing, and reporting, in addition to performing mapping operations for business transactional operations. These tools provide the means to achieve their configuration goals provided they are compatible with both the source and the destination data sources, in addition to being compatible with the available operating environment. While such automated tools provide valuable functionality, they remain proprietary; and further, they lack the flexibility and adaptability needed in today's business environment.
Therefore, there exists a need for a flexible service oriented solution to reduce the overall cost that is associated with data migration, sharing, reporting and mapping.
SUMMARY OF THE INVENTIONThe shortcomings of the prior art are overcome and additional advantages are provided through the provision of a method for the configurable real time transformation of dissimilar data sources, the method further consisting of the steps of acquiring real time information pertaining to at least one data source, wherein the information comprises reference information that is associated with the data source, data transformation specification information that is associated with the data source, and scheduled event specification information that is associated with the data source, and maintaining the data source information.
The method further comprises the steps of acquiring data from the data source in accordance with a specified scheduled event, converting the acquired data into a predetermined standardized format, performing at least one data transformation function from the real time stream upon the converted data in accordance with the acquired data transformation specification information that was associated with the data source; and transmitting the transformed data to a destination data source.
Computer program products corresponding to the above-summarized methods are also described and claimed herein.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
The subject matter that is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
DETAILED DESCRIPTION OF THE INVENTIONOne or more exemplary embodiments of the invention are described below in detail. The disclosed embodiments are intended to be illustrative only since numerous modifications and variations therein will be apparent to those of ordinary skill in the art. In reference to the drawings, like numbers will indicate like parts continuously throughout the views.
Aspects of the present invention relate to systems and methodologies for the configuration and implementation of data extraction transformation and data load tool solutions for business enterprises. As such, a flexible service oriented solution to reduce the overall cost associated with data migration, sharing, reporting and mapping is presented. By allowing a dynamic, real-time reconfiguration, and operations of the extract, transform and load (ETL) processing landscape, aspects of the present invention have the capability to provide a business enterprise with the capacity to adapt to an ever changing business environment at a much lower cost. The present invention provides solutions that can be synergistically utilized to quickly enable new business transactions in addition to reducing the cost and time of migrations to more cost effective data management solutions.
Within aspects of the present invention reference, transformation, schedule, and real time specification of remote or local data sources are created and maintained within the inventive system. On the specified schedule, remote or locally stored data is extracted and uploaded to the destination data source(s). Specifically, within embodiments of the present invention, the extracted data is normalized in a per fact basis into a XML document. The normalized XML data/document is transformed into a desired format according to transformation specifications that have been associated with the extracted data. Thereafter, the transformed data is loaded to a destination data source.
Aspects of the present invention are carried out within a computing system environment. The computer system as operated by a system user can embody a conventional personal computer on which a Web services based application that is configured to accomplish the objectives of the present methodologies is operating. As is conventional, the computer system also includes other hardware and software elements that are conventionally included in personal computers.
Turning now to the drawings in greater detail, it will be seen that in
Further comprises, are a data extraction component 120, a data transformation component 135, and a data load component 150. Source data that is to be processed within the system is extracted form a source data source 145, normalized into a predetermined data format at a normalized data store 125, and eventually uploaded to the specified destination data source(s) 155.
A comprehensive listing of data sources that the system is configured to extract data from is stored at the data source repository 130. Within aspects of the present invention the data extractor 120 acts to gather specified data from a data source 145 that is referenced at the data source repository 130. The extraction operation can either be initiated at the will of a system operator, or as part of a scheduled event that has been notated by the data source scheduler 115. Specifically, a data extraction operation comprises the function of fetching data from a data source 145. The data is extracted from the specified data source 145 at a rate that complies with the performance limitations of the remote data source host 145. Thereafter, the extracted data is transmitted to the normalized data store 125, where the extracted data is normalized and stored into XML documents.
The normalized data is retrieved from the normalized data store 125 by the data transformation component 135. Thereafter, the normalized data is transformed according to a set of data transformation rules that are contained within a data transformation specification. Data transformation specifications are stored and maintained at the transformation specification repository 140. Further, within embodiments of the present invention, data transformation specifications can be altered, or modified by a system operator at the time of the execution of the transformation of the normalized data. Essentially, data transformation is defined as the processes of converting normalized data into its final form prior to the data load component's 150 operation of transmitting (i.e., inserting or updating) the transformed data to its final destination data source.
Within yet further aspects of the present invention, there are specific protocols in place to assist in the remote or local storage and maintenance of data source references. In particular, data source references can be stored along with all the properties necessary to establish a proper connection for the data extraction operation, including any such credentials that are necessary for a successful extraction process.
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.
Claims
1-5. (canceled)
6. A method for the configurable real time transformation of dissimilar data sources in a web service-based environment, the method comprising:
- acquiring real time information pertaining to at least one data source, wherein the information comprises reference information that is associated with the data source, data transformation specification information that is associated with the data source, and scheduled event specification information that is associated with the data source;
- maintaining the data source information;
- acquiring data from the data source in accordance with a specified scheduled event;
- converting the acquired data into a predetermined standardized format using standardized normalization and a conversion format;
- performing at least one data transformation function identified from the real time information upon the converted data in accordance with the acquired data transformation specification information associated with the data source;
- updating an extraction schedule; and
- transmitting the transformed data to a destination data source;
- wherein the standardized normalization and conversion format is managed using extensible markup language;
- wherein the data sources is stored at a remote location;
- wherein the data sources, transformation specification information, and scheduled event specification information associated with the data source are configured to be modified.
Type: Application
Filed: Jan 19, 2007
Publication Date: Jul 24, 2008
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventor: Alfredo Alba (Morgan Hills, CA)
Application Number: 11/624,893
International Classification: G06F 15/16 (20060101);