Abstract: Executing graph-based computations includes: accepting a specification of a computation graph in which data processing elements are joined by linking elements; dividing the data processing elements into sets, at least one of the sets including multiple of the data processing elements; assigning to each set a different computing resource; and processing data according to the computation graph, including performing computations corresponding to the data processing elements using the assigned computing resources.
Type:
Grant
Filed:
May 16, 2006
Date of Patent:
January 11, 2011
Assignee:
Ab Initio Technology LLC
Inventors:
Joseph Skeffington Wholey, III, Igor Sherb, Ephraim Meriwether Vishniac
Abstract: A method, and corresponding system and software, is described for writing data to a plurality of queues, each portion of the data being written to a corresponding one of the queues. The method includes, without requiring concurrent locking of more than one queue, determining if a space is available in each queue for writing a corresponding portion of the data, and if available, reserving the spaces in the queues. The method includes writing each portion of the data to a corresponding one of the queues.
Type:
Grant
Filed:
June 27, 2005
Date of Patent:
January 4, 2011
Assignee:
Ab Initio Technology LLC
Inventors:
Spiro Michaylov, Sanjeev Banerji, Craig W. Stanfill
Abstract: Processing data includes accepting information characterizing values of a first field in records of a first data source and information characterizing values of a second field in records of a second data source. Quantities characterizing a relationship between the first field and the second field are computed based on the accepted information. Information relating the first field and the second field is presented.
Abstract: Processing data includes identifying a plurality of subsets of fields of data records of a data source, determining co-occurrence statistics for each of the plurality of subsets, and identifying one or more of the plurality of subsets as having a functional relationship among the fields of the identified subset.
Abstract: Ordering parameters in a graph-based computation includes determining a desired first ordering of a set of parameters associated with graph elements in a computation graph; determining an ordering constraint for the set of parameters; and determining a second ordering of the set of parameters that satisfies the ordering constraint according to the desired first ordering.
Abstract: An approach to performing graph-based computation uses one or both of an efficient startup approach and efficient control using process pools. Efficient startup of a graph-based computation involves precomputing data representing a runtime structure of a computation graph such that an instance of the computation graph is formed using the precomputed data for the required type of graph to form the runtime data structure for the instance of the computation graph. Pools of processes that are each suitable for performing computations associated with one or more vertices of the computation graphs are formed such that at runtime, members of these pools of processes are dynamically assigned to particular vertices of instances of computation graphs when inputs are available for processing at those vertices.
Abstract: A number of tasks are defined according to a dependency graph. Multiple parameter contexts are maintained, each associated with a different scope of the tasks. A parameter used in a first of the tasks is bound to a value. This binding includes identifying a first of the contexts according to the dependency graph and retrieving the value for the parameter from the identified context.
Abstract: Aggregating data includes accepting a first data set that includes records, each record holding a value for each of a plurality of fields. A second data set is generated from the first data set. The second data set includes one or more aggregated records each corresponding to one or more records from the first data set that match values in a subset of fields. A third data set is generated from the second data set. The third data set includes one or more aggregated records each corresponding to one or more aggregated records of the first data set that match values in a subset of fields. An aggregate value associated with an aggregated record in the third data set represents a result of performing a non-cascadable operation on values associated with a plurality of records from the first data set.