Abstract: A system and a method are disclosed for transforming cluster computing resources of a target system to match a user defined logical data flow so data processing between the source and target are functionally equivalent. The source logical dataflow is compatibly mapped to a target directed acyclic graph that represents the cluster computing resources. A series of subgraph assertion and transform operations are applied iteratively until the logical data flow is isomorphic with a directed acyclic graph. The assertion and transform operations are comprised of rules and assertions, which are maintained in a rules registry.
Abstract: A system and method for predicting the amount of time and/or resources required to execute a job on a big data set, and/or a system and method for automatically providing one or more suitable commands to a user for constructing a job for manipulating a big data set. The system and method are optionally and preferably implemented with regard to Hadoop.