Method and system for data extraction from a transaction system to an analytics system
The present invention provides a method and system for the automatic extraction of data from a transaction system to an analytics system, which is capable of handling large volumes of application data.
The present invention relates to the areas of computer software, software engineering and development. In particular, the present invention provides a method and system for extraction of data from an application on a transaction system to an analytics system. The method and system provides generic extraction services for arbitrary data types and is capable of handling voluminous data generation by avoiding the use of a middleware channel.
BACKGROUND INFORMATION It is often desirable to retain records of business transactions for analytical purposes such as data mining. Recent developments in business software systems and architecture have provided this functionality in an automated fashion.
A particular example of a transaction and analytics system is the collaborative behavior of an OLTP (“Online Transaction Processing”) system and an OLAP (“Online Analytical Processing”) system. OLTP refers to a type of computer processing in which a computer system responds immediately to user requests as opposed to batch processing. Each request is considered to be a transaction. An OLAP refers to a category of software tools that provides analysis of data stored in a database. OLAP tools enable users to analyze different dimensions of multidimensional data such as time series and trend analysis views. An OLAP often is used in data mining. Typically an OLAP includes a server, which sits between a client and a database management system (“DBMS”). The OLAP server understands how data is organized in the database and has special functions for analyzing the data.
A significant technical challenge concerns the mechanism through which transaction data 115 is to be made available (i.e., transported) to analytics system 105. Intelligent design of an architecture that permits flexible and efficient extraction of data from transaction system 110 to analytics system 105 may have a significant performance impact upon the interaction between transaction system 110 and analytics system 105. In particular, it is desirable that the chosen architecture provide generic extraction of arbitrary data types without requiring reprogramming on a case by case basis. In addition, it is important that the architecture accommodate the particular data load generated by the transaction system.
SUMMARY OF THE INVENTIONThe present invention provides a generic extraction framework for extraction of data generated on a transaction system to an analytics system. The generic extraction framework allows extraction services to be efficiently created for arbitrary data types with the need for reprogramming on a case by case basis. In addition, the generic extraction framework obviates the need for the transmission of data using a middleware layer, and therefore provides an environment for transmission of high data volumes between a transaction and analytics system.
According to one embodiment of the invention an application developer using an application modeling environment may select data (e.g., transaction data) to be extracted from a software application to be run on the transaction system to the analytics system. The modeled application and the selection information are compiled against extraction services provided by the framework in order to provide these extraction services to the running application. An application generator generates both runtime extraction modules and data structures that provide for the extraction of data from the transaction system to the analytics system.
According to one embodiment, a framework class is provided that handless all tasks related to the extraction process. In addition, the application generator generates an extraction data structure that is utilized to maintain the extraction data. In order to enable the extraction process to operate within a preexisting architecture that includes an analytics adapter, a data source definition record is made accessible to the analytics adapter. The data source definition record maintains all the required information necessary to enable the extraction process to occur including the data source, extraction structure, application component and appropriate extraction modules.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention provides a generic framework for performing extraction services from a transaction system 110 running any number of applications to an analytics system 105. The generic extraction framework provided by the present invention accommodates arbitrary data types without the need for reprogramming on a case by case basis. In addition, the generic extraction framework is able to handle significant data volumes as it provides data transport services from an application running on a transaction system to a queue (where it is stored for later uploading to an analytics system) without the use of a middleware layer.
The overall extraction process typically comprises a number of distinct phases or modes.
Typically, each information package 235a, 235b, 235c is associated with a particular phase of the extraction process. For example, 181a depicts a full upload phase of an extraction process. During this phase based upon instructions in full upload information package 2345a, processor 215a on analytics system 105 executes RFC 245, which causes all data generated by application 209 to be extracted from transaction system 110 to analytics system 105. Extraction data is retrieved from transaction system database 245 and processed by selection module 225 and mapping module 227, and then transported to analytics system 105. The structure and function of selection and mapping modules 225 and 227 will become evident as the invention is further described. For now, it is sufficient to realize that selection module 225 and mapping module 227 perform some operations on transaction data stored in transaction database 245 so it is in a format suitable for analytics system 105. In addition, during full upload phase 181a, metadata tables 255 on analytics system 105 are populated with information regarding transaction data to be extracted. Note that queue 285 on transaction system 110 is not utilized during full upload phase 181a.
181b shows a delta initialization phase of an extraction process. During delta initialization phase 181b, processor 215 on analytics system 105 initiates RFC call 245 that establishes channel 229 on transaction system 110. Channel 229 will allow transport of transaction data 115 from application 209 to queue 285. The structure and nature of channel 229 will become evident as the invention is further described. In addition, during delta initialization phase 181b, metadata tables 255 are populated.
Once delta initialization 181b has occurred, application 209 may generate transaction data, which is then automatically transported to queue 285 via channel 229 for later upload to analytics system 105. Typically, transaction data stored in queue 285 represents changes (i.e., delta information) of the data that was already uploaded during full upload phase 181a.
During delta upload phase 181c, processor 215 executes RFC 245, which causes delta transaction data stored in queue 285 to be uploaded to analytics system 105.
As noted above, during delta initialization phase 181b, channel 229 is established for transport of delta transaction data to queue 285. A significant performance issue concerns the efficiency and structure of channel 229.
A significant technical issue concerns the nature of channel 229 for transmission of transaction data 115 from application 209 to queue 285. A related technical issue concerns how transaction data 115 is to be prepared for insertion into queue 285 in a form that is appropriate for analytics system 105. These design choices may have a significant performance impact upon the extraction process.
For example,
The architecture shown in
Transaction data container 205 is transmitted from application 209 through middleware 230. Service API 227 provides functions for transforming transaction data 115 stored in transaction data container 205 into flat file structure 121, which is compatible with analytics system 105. Transaction data 115 that has been converted to flat file structure 121 is stored in queue 285 for retrieval by analytics system 105 via RFC call 245. Among other functions, analytics adapter module 240 prepares data transaction data 115 for processing by transaction system 105. Analytics system 105 makes RFC call 245 to analytics adapter module 240 to perform upload of data in queue 285. Among other functions, analytics adapter module 240 performs functions for upload of data to analytics system 105.
The architecture shown in
Furthermore, in a many scenarios, transaction system 110 may generate a significant amount of data. For example, in a resales order transaction system, high volumes of data are typically generated. Using the scenario shown in
Extraction framework 310 provides services and functionality for performing extraction of data from transaction system 110 to analytics system 105. According to one embodiment of the present invention, application developer 350 may model application 209 to invoke extraction services provided by extraction framework 310.
Application modeling environment 360 generates metadata 317, which is processed by application generator 329. Metadata 317 includes information representing the modeling choices made by application developer 350. In particular, among other things, metadata 317 reflects the services to be invoked from application framework 315 and extraction framework 310. Application generator 329 receives metadata 317 and generates application 209. In particular, according to one embodiment, application generator 329 generates extraction data structure 320, extraction runtime components 330 and application runtime components 340.
Application runtime components 340 represent runtime files for executing application 209, and in particular those services associated with application framework 315. Application runtime components 340 may include runtime executable files, resources, etc. (e.g., DLL files, EXE files, Java Byte Code, etc.). Similarly, extraction runtime components 330 may include runtime files for executing extraction services provided by extraction framework 310. Extraction runtime components 330 include runtime executable files, resources, etc. (e.g., DLL files, EXE files, Java Byte Code).
As shown in
According to one embodiment of the present invention, extraction event handler 435 detects events occurring with respect to application 209. Upon detection of particular events, extraction event handler 435 causes particular actions to occur. The function of extraction event handler 435 will become evident as the invention is further described. According to one embodiment of the present invention, event handler 435 detects save events whereby transaction data 115 is written to transaction system database 245.
The generic extraction framework according to the present invention may be stored as a set of instructions that is accessible and executable by a processor. This set of instructions may stored in a storage subsystem that may include a compact disk, hard drive, DVD-ROM, CD-ROM or any type of computer- or machine-readable storing medium.
RFC call 245 causes transaction data stored in transaction system database 245 to be extracted from transaction system database 245 and processed by selection 225 and mapping 227 modules. Selection module 225 determines the data stored in transaction system database 245 that is to be extracted from transaction system database 245. As noted above, the selection of data to be extracted from transaction system 110 to analytics system 105 is established during application modeling (i.e., see
Once transaction data stored in transaction system database 245 is processed by selection module 225 and mapping module 227, it is passed to analytics adapter 240 where it is uploaded to analytics system 165 and stored in DMBS 130. In addition, during the full upload process, metadata tables 255 on analytics system 105 are populated with information regarding the transaction data 115 that is to be extracted to analytics system 105.
Processor 215b then makes call to analytics adapter module 240 to cause delta transaction data stored in queue 285 to analytics system 105. Data received at analytic system 105 is stored in DBMS 130 where it is available for further processing, analytics, refinement, etc.
According to one embodiment a generic extractor classes 405 handle all tasks related to the extraction process. According to one embodiment, an extraction class 405 CL_CMS_LO_BW_APPL_EXTRACT has the following methods:
CMS_LO_BW_GET_TIME_INTERVAL
This method provides information regarding when was the last time the analytics system was updated with data using the datasource.
CMS_LO_BW_UPDATE_TIME_INTERVAL
This method updates the information about the time when the update using the datasource is performed.
CMS_LO_BW_EXTRACT
The method performs the actual fetch of the data on the basis of the selection options in form of data packets. This method does not use the time stamp approach to handle the delta update requirements. It works on the premise that the delta queues would be directly updated with changed data after the initial upload.
CMS_LO_BW_EXTRACT_DELTA
The method performs the actual fetch of the data on the basis of the selection options in form of data packets. This method uses the time stamp approach to handle the delta update requirements. It works on the premise that the delta queues would be updated with changed data after the initial upload using the timestamps of the previous upload being maintained in the timestamps table.
CMS_LO_BW_MAPPER
This method is used to map the complex data into a flat structure so that it can be moved across to analytics system 105.
LOG_WRITE
This method is used to log exceptions to a log.
According to one embodiment, the CMS_LO_BW_EXTRACT method has the following signature:
According to one embodiment, the CMS_LO_BW_MAPPER method has the following signature:
According to one embodiment of the present invention, selection module 225 is implemented as follows.
Selection Module Template
The following is pseudo-code for selection module 225 according to one embodiment of the present invention:
- 1. Create the instance of the generic extractor class cl_cms_lo_bw_appl_extract.
- 2. If lv_initflag is not initial.
- Loop at the input selection options.
- Append the input selection options to the selection options table. Endif.
- 3. Use the generic extractor instance to call the extraction method.
According to one embodiment of the present invention, mapping module 227 is implemented as follows.
Mapping Module Template
The following is pseudo-code for selection module 225 according to one embodiment of the present invention:
-
- 1. If no class instance exists.
- a. Create class cl_cms_lo_bw_appl_extract instance. Endif.
- 2. Call the mapper method using the class instance.
- 1. If no class instance exists.
According to one embodiment, application generator 515 utilizes the following interface, which interprets the selection module template and mapping module template shown above.
Data source field identifies a name of the data source definition record. Extraction structure 830 identifies the name of the extraction data structure 320 generated by application generator 515. Application component 840 identifies the associated application component 209 for which data extraction is to occur. Selection and mapping modules 850 and 860 respectively identify the extraction and mapping modules (225, 227).
Normally, BDoc field 830 would identify a transaction data container type 205 for transmitting data from a transaction system 110 to analytics system 105. However, the present invention provides a method for extraction without the use of a transaction data container 205 and is left blank.
The following is an exemplary set of fields for inclusion in a data source definition record 525 according to one embodiment of the present invention.
A method and system for extraction of data to an analytics system has been described. The present invention provides a method for extraction of data from an application to an analytics system using a generic extraction data structure generated during application generation. This method eliminates the need for an event driven middleware approach and is thus suitable for environments in which large amounts of data are generated.
Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Claims
1. A method for extracting data from a software application to an analytics system comprising:
- receiving extraction information identifying data to be extracted to the analytics system;
- generating an application as a function of the extraction information data, wherein the application includes: a data structure for storing extraction data; and at least one component for performing an extraction function to prepare data for processing by the analytics system; and
- configuring the analytics system to invoke the at least one component performing an extraction function at at least one predetermined time.
2. The method according to claim 1, wherein the at least one component includes a selection module.
3. The method according to claim 1, wherein the at least one component includes a mapping module.
4. The method according to claim 1, wherein the data structure is a complex data structure.
5. The method according to claim 4, wherein the at least one component transforms the complex data structure to a flat file structure.
6. The method according to claim 1, wherein the application is an online transaction processing (“OLTP”) system.
7. The method according to claim 1, wherein the analytics system is an online analytics processing (“OLAP”) system.
8. A system for extracting data from a software application to an analytics system comprising:
- a processor, the processor providing an application modeling environment for: receiving extraction information identifying data to be extracted to the analytics system; generating an application as a function of the extraction information data, wherein the application includes: a data structure for storing extraction data; and at least one component for performing an extraction function to prepare data for processing by the analytics system; and configuring the analytics system to invoke the at least one component performing an extraction function at at least one predetermined time.
9. The system according to claim 8, wherein the at least one component includes a selection module.
10. The system according to claim 8, wherein the at least one component includes a mapping module.
11. The system according to claim 8, wherein the data structure is a complex data structure.
12. The system according to claim 11, wherein the at least one component transforms the complex data structure to a flat file structure.
13. A program storage device, the program storage device including instructions for:
- receiving extraction information identifying data to be extracted to the analytics system;
- generating an application as a function of the extraction information data, wherein the application includes: a data structure for storing extraction data; and at least one component for performing an extraction function to prepare data for processing by the analytics system; and
- configuring the analytics system to invoke the at least one component performing an extraction function at at least one predetermined time.
14. The program storage device according to claim 13, wherein the at least one component includes a selection module.
15. The program storage device according to claim 13, wherein the at least one component includes a mapping module.
16. The program storage device according to claim 13, wherein the data structure is a complex data structure.
17. The program storage device according to claim 13, wherein the at least one component transforms the complex data structure to a flat file structure.
18. A machine-readable medium storing a generic extraction framework for extraction data generated by an application to an analytics system comprising:
- a generic extraction class, the generic extraction class providing services for population of a complex data structure as a function of transaction data generated by the application;
- a data handler configured to transport data stored in the complex data structure to a buffer for uploading to the analytics system;
- an extraction data structure, comprising a complex data type; and
- an event handler configured to detect a generation of transaction data by the application and to invoke at least one method of the generic extraction class.
19. The machine-readable medium of claim 18, wherein the buffer is a queue.
20. The machine-readable medium of claim 19, wherein the complex data structure is compatible with a data container designed to utilize a middleware for the transmission of transaction data from the application to the buffer.
21. The machine-readable medium of claim 20, wherein the generic extraction class invokes at least one method call to a service expecting the passing of a transaction data container, but instead passes the complex data structure.
Type: Application
Filed: Jun 30, 2004
Publication Date: Jan 5, 2006
Patent Grant number: 7774298
Inventors: Vishal Trivedi (Bangalore), Venkiteswaran Vadakkencherry (Bangalore)
Application Number: 10/880,861
International Classification: G06F 17/30 (20060101);