Navigation of tree data structures
Data items are represented by trees and stored in a database, the collection of data items defining a forest. Queries and masks are also represented by trees. A method for navigating the forest of data items is disclosed in the context of a graphical user interface. A set of operations on trees are defined such that the data items can be queried on the basis of structure as well as node values. That is, the query can include a specification of the relationship between nodes in a tree, as well as the data in the nodes themselves. Exemplary implementations of such operations are disclosed in the context of a database update procedure. Additionally disclosed are methods for efficiently storing and processing the forest of data items.
This application claims the benefit of U.S. Provisional Application No. 60/504,400, filed Sep. 19, 2003, the entire disclosure of which is hereby incorporated herein by reference in its entirety. This application is also related to U.S. patent application entitled “Update of a Tree-Based Database” by David M. Ziemann and John F. Samuel, and to U.S. patent application entitled “Processing of Tree Data Structures” by David M. Ziemann and John F. Samuel, both of which are filed concurrently herewith and incorporated herein by reference in their entirety.
FIELD OF THE INVENTIONThis invention relates to the field of databases, and, more specifically, to a system that uses tree data structures to represent, exchange, query, store, update, and navigate data.
BACKGROUND OF THE INVENTIONThis invention is often discussed herein in the context of financial risk management, but this invention is much broader in its application because the tree-based database according to this invention may be used for literally any type of heterogeneous data.
Financial risk-management systems rely on complex object models for the storage, query and update of data. Each point of the data is derived from a diverse set of inputs that depend on market data, trade details and configured parameters. The number and type of these inputs differs widely among different categories of risk exposure; however, the risk manager needs to see the data as a single, unified search space for reporting and analysis.
The same applies for any other type of complex system that relies on a wide range of different data structures. As will be discussed in the following sections, conventional database models fail to adequately support such complex systems and fail to provide a unified search space for the query and aggregation of results.
The Relational Database Model
The conventional relational model is often used as the foundation for financial risk management systems. The strength of the relational model is presenting and manipulating tabular collections of data where each column of data has the same structure. However, it does not easily lend itself to representing collections of data with diverse structures.
If a relational database model is adapted to support widely diverse data structures, one of two approaches are typically adopted. In the first approach, the structural aspects of the database are increased in complexity in order to accommodate the diverse data structures. For instance, every distinct structure may be represented with a separate table linked to a primary table with a “key.” Each distinct structure also has its own procedures for query and update. This approach is very inflexible, and in the worst case, the addition of a new structure requires all the procedures to be rewritten.
In the second approach, the relational database is simplified in order to make all data elements fit the same structure. This approach leads to redundancy and expansion of storage requirements in the database. For example, a scalar value (i.e., a zero dimensional point) might have additional, redundant x and y values so that one and two-dimensional points can be stored in the same table. Any new structures that do not fit the database structure either have to be trimmed to fit or result in a restructuring of all the data and procedures.
Object-Oriented Databases
Objected oriented databases address the problems of the domain model by encapsulating behavior and state into a single object. Provided that the objects implement an appropriate application program interface (API), it is possible to have collections of heterogeneous data. Further, new structures may be added to the database without requiring a major code rewrite. However, the drawback of object-oriented databases is that they are extremely difficult to access by external systems.
XML
XML, while not really a system, is widely used for the transfer of data between systems and has a growing following in the financial-risk management world. In XML, data is structured in a tree-like fashion, with named parts.
As a means to transfer data, XML has many advantages. For example, many APIs to other applications exist and it is flexible and extensible. However, XML does not offer any storage, update or query mechanism and the XML representation of objects is too verbose to be an option for storing data.
SUMMARY OF THE INVENTIONThese problems are solved and a technical advance is achieved in the art by a system and method that uses trees to represent, exchange, query, store and update data. In terms of what can be represented, this system has the flexibility of XML, but also provides a storage mechanism, which XML does not. The tree system facilitates easy access by external systems by producing tabular output, in a manner similar to the relational model. Unlike the relational model, however, it is possible to support diverse structures within the same search space. Finally, the tree model supports a simple external interface using a tree-valued language that can be used by external systems.
In this system, data, queries, and masks are represented by trees. For instance, queries are represented by special kinds of trees, known as partially-bound trees, where parts of the tree are defined to be any allowable structure, i.e., unbound. Masks are also represented as special kinds of trees, where parts of the tree are undefined. Masks may be used to generate Query Trees. Accordingly, queries and masks can be manipulated using the same operations as the underlying data. By representing data, queries, and masks as trees, storage and operations between trees are simplified.
According to one aspect of the invention, a method is disclosed for navigating a collection of tree data structures stored in a computer-readable database, the method including constraining a first node of a query tree stored in a computer-readable memory to a first value, and making accessible a first set of nodes of the query tree that are logically connected (hereinafter “connected”) to the first node constrained to the first value. The method also includes constraining a second node in the first set of nodes to a second value. Additionally, the method identifies a tree in the collection of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value. Data in a select node of the identified tree is accessed. The select node may be the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree.
The method for navigating a collection of tree data structures may further include making accessible a second set of nodes of the query tree that are connected to the second node constrained to the second value. The select node is equal in position to the first node of the query tree, the second node of the query tree, or a node in the accessible first or second sets of nodes of the query tree. In an exemplary embodiment, the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
According to another aspect of the invention, in a computer system having a graphical user interface including a display device and one or more input devices, a method is disclosed for navigating a collection of tree data structures stored in a computer-readable database. This method includes receiving a first value from the one or more input devices to which a first node of a query tree stored in a computer-readable memory is constrained, and displaying with the display device a first set of nodes of the query tree that are connected to the first node constrained to the first value. This method also includes identifying a tree in the collection of tree data structures that contains a first matching node equal in position to the first node and equal to the first value, and displaying with the display device data in a select node of the identified tree. In an exemplary embodiment, the select node is the first matching node or a node connected to the first matching node of the identified tree.
This method may further include receiving a second value from the one or more input devices to which a second node in the first set of nodes is constrained, and displaying with the display device a second set of nodes of the query tree that are connected to the second node constrained to the second value. In one example, the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
In one scenario, identifying the tree identifies a tree in the collection of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value. In this case, the select node is the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree.
In another scenario, identifying the tree identifies a plurality of trees in the collection of tree data structures that contain (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value. In this case, displaying the data in the select node displays data in a plurality of select nodes of each of the identified plurality of trees. Each of the plurality of select nodes are the first matching node, second matching node, or a node connected to the first or second matching nodes of the respective identified trees. Also, each of the plurality of select nodes are equal in position to the first node of the query tree, the second node of the query tree, or a node in the first or second sets of nodes of the query tree. Displaying the data in the plurality of select nodes may display, via the display device, the data of the plurality of select nodes in a tabular format. Also in this scenario, the method may further include displaying the query tree in a constraint pane, wherein the first set of nodes and the second set of node are displayed in the constraint pane. The data in the plurality of select nodes is displayed in a data pane.
According to yet another aspect of the invention, a system is disclosed for navigating a collection of tree data structures. The system includes a database component operative to maintain a database of tree data structures, a memory component operative to store a query tree, an input component, a display component, and a processing component. The processing component is communicatively connected to the database component, the memory component, the input component, and the display component. Further, the processing component performs actions including interpreting a first signal from the input component as an instruction to constrain a first node of the query tree to a first value, and constraining the first node of the query tree to the first value. The processing component performs actions further including transmitting an instruction to the display component to display a first set of nodes of the query tree that are connected to the first node constrained to the first value. The processing component also communicates with the database component to identify a tree in the database of tree data structures that contains a first matching node equal in position to the first node and equal to the first value. The processing component is additionally programmed to transmit an instruction to the display component to display data in a select node of the identified tree. In one example, the select node is the first matching node or a node connected to the first matching node of the identified tree.
The processing component may also be programmed to perform actions further comprising interpreting a second signal from the input component as an instruction to constrain a second node in the first set of nodes to a second value, and constraining the second node to the second value. In this case, the processing component transmits an instruction to the display component to display a second set of nodes of the query tree that are connected to the second node constrained to the second value. Also in this case, communicating with the database component communicates with the database component to identify a tree in the database of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value. In this scenario, the select node is the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree. Also, the select node is equal in position to the first node of the query tree, the second node of the query tree, or a node in the first or second set of nodes of the query tree. Further, in this example, the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
According to still yet another aspect of this invention, a method is disclosed for updating a collection of tree data structures in a computer-readable database. This method includes applying a mask to input data to generate a query tree. The mask, the input data, and the query tree each correspond to a tree data structure. This method also includes storing the query tree in a computer-readable memory, applying the query tree to the collection of tree data structures in the database to identify an identified tree consistent with the query tree, deleting the identified tree from the database, and adding the input data to the database. The input data may include a data node having a value, the mask may have an extending node at a same relative position as the data node, and the query tree may include a query node at the same relative position as the data node and the extending node. In this case, when the mask is applied to the input data to generate the query tree, the extending node propagates the value of the data node to the query node, and the identified tree comprises an identified node having the same relative position as the query node and having the value of the query node. The collection of tree data structures may include heterogeneous data.
In one example of this method for updating a collection of tree data structures, the input data may be a unit of input data and the method may further include receiving a set of input data comprising a plurality of input data including the unit of input data, each of the set of input data corresponding to a tree data structure. In this case, the method also includes generating the mask by identifying a common characteristic among the set of input data, storing the mask in a computer-readable memory, and adding the set of input data to the database. The common characteristic among the set of input data includes a matching node in each of the input data, wherein each matching node has a same value and a same relative position as every other matching node. Further, generating the mask generates the mask to have an extending node having the same relative position as each of the matching nodes, and the query tree includes a query node having the same relative position as each of the matching nodes and the extending node. When the mask is applied to the unit of input data to generate the query tree, the extending node propagates the value of the unit of input data's matching node to the query node. Additionally, the identified tree includes an identified node having the value and the same relative position as the query node.
This method for updating a collection of tree data structures may also include applying the mask to a second set of input data to generate a plurality of query trees each corresponding to a tree data structure, and each of the input data of the second set of input data corresponding to a tree data structure. In this scenario, the method includes storing the plurality of query trees in a computer-readable memory, and applying the plurality of query trees to the collection of tree data structures in the database to identify a plurality of identified trees consistent with at least one of the plurality of query trees. The plurality of identified trees from the database are deleted and the second set of input data are added to the database. Also in this case, each of the input data of the second set of input data comprises a data node, and each data node has (1) a value and (2) a same relative position as every other data node. The mask has an extending node at the same relative position as each of the data nodes, and each of the plurality of query trees includes a query node at the same relative position as each of the data nodes and the extending node. When the mask is applied to the second set of input data to generate the plurality of query trees, the extending node propagates the value of each of the data nodes to each of the respective query nodes. The query nodes each have a different value, and the plurality of identified trees each include an identified node having the same relative position as each of the query nodes and having a same value as one of the query nodes.
According to still yet another aspect of this invention, a system is disclosed for updating a collection of tree data structures. The system includes a database component operative to maintain a database comprising the collection of tree data structures, a memory component, an input component, and a processing component. The processing component is communicatively connected to the database component, the memory component, and the input component. The processing component performs actions including receiving input data from the input component, the input data corresponding to a tree data structure, and applying a mask to the input data to generate a query tree, the mask and the query tree each corresponding to a tree data structure. The processing component is also programmed for storing the query tree with the memory component, and applying the query tree to the tree data structures in the database to identify an identified tree consistent with the query pattern. The processing component instructs the database component to delete the identified tree from the database and to add the input data to the database. The collection of tree data structures in the database may include heterogeneous data.
In an example of this system for updating a collection of tree data structures, the input data is a unit of input data, and the processing component performs actions further including receiving a set of input data comprising a plurality of input data including the unit of input data, each of the set of input data corresponding to a tree data structure, and generating the mask by identifying a common characteristic among the set of input data. In this case, the processor is also programmed for storing the mask with the memory component and instructing the database component to add the set of input data to the database. The common characteristic among the set of input data comprises a matching node in each of the input data, wherein each matching node has a same value and a same relative position as every other matching node. Also in this example, generating the mask generates the mask to have an extending node having the same relative position as each of the matching nodes, and the query tree includes a query node having the same relative position as each of the matching nodes and the extending node. When the mask is applied to the unit of input data to generate the query tree, the extending node propagates the value of the unit of input data's matching node to the query node. Further, the identified tree includes an identified node having the value and the same relative position as the query node.
In another example of this system for updating a collection of tree data structures, the input data comprises a data node having a value. The mask has an extending node at a same relative position as the data node, and the query tree comprises a query node at the same relative position as the data node and the extending node. In this case, when the mask is applied to the input data to generate the query tree, the extending node propagates the value of the data node to the query node. And, the identified tree comprises an identified node having the same relative position as the query node and having the value of the query node.
The processing component of this system for updating a collection of tree data structures may be programmed to perform actions further including applying the mask to a second set of input data to generate a plurality of query trees, each corresponding to a tree data structure, and each of the input data of the second set of input data corresponding to a tree data structure. In this case, the processor is also programmed for storing the plurality of query trees with the memory component, and applying the plurality of query trees to the tree data structures in the database to identify a plurality of identified trees consistent with at least one of the plurality of query trees. The processor may instruct the database component to delete the plurality of identified trees from the database and to add the second set of input data to the database. In this case, each of the input data of the second set of input data comprises a data node, and each data node has (1) a value, and (2) a same relative position as every other data node. The mask has an extending node at the same relative position as each of the data nodes, and each of the plurality of query trees comprises a query node at the same relative position as each of the data nodes and the extending node. When the mask is applied to the second set of input data to generate the plurality of query trees, the extending node propagates the value of each of the data nodes to each of the respective query nodes. The query nodes each have a different value, and the plurality of identified trees each comprise an identified node having the same relative position as each of the query nodes and having a same value as one of the query nodes.
According to still yet another aspect of this invention, a method is disclosed for processing a collection of tree data structures in a computer-readable database. This method includes identifying a set of trees in the collection of tree data structures, each tree in the set of trees having a same structure. The method also includes forming a pattern having the same structure as each tree in the set of trees, and processing the pattern. The pattern is processed in lieu of processing each tree in the set of trees. Also, processing the pattern may comprise applying a query tree to the pattern.
In one example of this method for processing a collection of tree data structures, each tree in the set of trees includes a leaf node having a value, and the method further includes storing the pattern in a computer-readable memory, and storing the leaf node of each tree in the set of trees in a computer-readable memory. The pattern is stored in lieu of storing the same structure of each tree in the set of trees.
According to still yet another aspect of this invention, a second method is disclosed for processing a collection of tree data structures in a computer-readable database. This method includes partitioning the collection of tree data structures into disjoint sets of trees, each set of trees comprising trees of a same structure. The method also includes forming a set of patterns, each pattern corresponding to one of the sets of trees, and each pattern having the same structure as its corresponding set of trees. Further, the method includes processing the set of patterns. The set of patterns are processed in lieu of processing each tree in each of the sets of trees, and includes applying a query tree to each pattern in the set of patterns. Additionally, processing the set of patterns may process the set of patterns with distributed processors, each distributed processor processing one or more of the patterns in the set of patterns.
In one example of this second method for processing a collection of tree data structures, each tree in each of the sets of trees includes a leaf node having a value, and the method further includes storing the set of patterns in a computer-readable memory, and storing the leaf node of each tree in each of the sets of trees in a computer-readable memory. The set of patterns are stored in lieu of storing a structure of each tree in each of the sets of trees.
According to still yet another aspect of this invention, a system is disclosed for processing a collection of tree data structures. The system includes a database component operative to maintain a database comprising the collection of tree data structures and a processing component communicatively connected to the database component. By communicating with the database component, the processing component performs actions including identifying a set of trees in the collection of tree data structures, each tree in the set of trees having an identical structure. The processing component is also programmed for forming a pattern having the identical structure as each tree in the set of trees, and processing the pattern. The processing component processes the pattern in lieu of processing each tree in the set of trees.
This system may further include an input component communicatively connected to the processing component. In such a case, the processing component performs actions further comprising receiving information from the input component and generating a query tree based upon the received information. Also in this case, processing the pattern by the processing component includes applying the query tree to the pattern.
This system may further include a memory component communicatively connected to the processing component. Also, each tree in the set of trees includes a leaf node. The processing component stores the pattern with the memory component and stores the leaf node of each tree in the set of trees with the memory component. The pattern is stored in lieu of storing the same structure of each tree in the set of trees.
According to still yet another aspect of this invention, a second system is disclosed for processing a collection of tree data structures. This system includes a database component operative to maintain a database comprising the collection of tree data structures and a processing component communicatively connected to the database component. The processing component performs actions including partitioning the collection of tree data structures in the database into disjoint sets of trees, each set of trees comprising trees having an identical structure, and the partitioning being assisted by communicating with the database component. The processing component is also programmed for forming a set of patterns, each pattern corresponding to one of the sets of trees, and each pattern having the same structure as its corresponding set of trees. Further, the processing component processes the set of patterns. The processing component processes the set of patterns in lieu of processing each tree in each of the sets of trees. Additionally, the processing component may comprise multiple distributed processors, each multiple distributed processor processing one or more of the patterns in the set of patterns.
This system may also include an input component communicatively connected to the processing component. In this case, the processing component performs actions further including receiving information from the input component, and generating a query tree based upon the received information. The processing component processes the set of patterns by applying the query tree to each pattern in the set of patterns.
This system may further include a memory component communicatively connected to the processing component. Also, each tree in each of the sets of trees includes a leaf node, and the processing component is additionally programmed for storing the set of patterns with the memory component, and storing the leaf node of each tree in each of the sets of trees with the memory component. The set of patterns are stored in lieu of storing a structure of each tree in each of the sets of trees.
BRIEF DESCRIPTION OF THE DRAWINGSA more complete understanding of this invention may be obtained from a consideration of this specification taken in conjunction with the drawings, in which:
Navigation of a Collection of Trees Using a Query Tree
Each row of the query tree shown in constraint pane 102 identifies a node in the query tree. Each node in the query tree has a node name and a node value, represented herein with a description of the node name, followed by an equal sign (“=”), which is then followed by a description of the node value. For instance, the second row of the query tree shown in the constraint pane 102 in
The layout of the query tree shown in constraint pane 102 of
Walking through the constraint process of
Processing now moves to step 2, where the query tree column 204 displays the set of possible child nodes for a root node constrained to value “result MTM” 212. At this time, and at each time the query tree is further constrained, the processor of the computer operating this system may automatically communicate with the database to search for all trees in the collection of trees that have a root node with value “resultMTM” 212. More generally, an attempt is made to identify trees within the collection of trees that have a node (“matching node”) equal in position to the root node of the query tree and equal in value to that of the root node, which in this case is “resultMTM” 212. As shown in column 208 of
Proceeding with the example of
In step 3, the user selects the node “vc=” 218 in column 204. There is only one possible node value for this node, which is “vc” 220, as shown in column 206. Therefore, the user decides to select node value “vc” 220 for this node. When node “vc” 218 is constrained to node value “vc” 220, the query tree is expanded to expose a second set of possible child nodes “env,” “valueDate,” “scenarioLabel,” and “scenarioID,” as shown in column 204, step 4. These four child nodes are children of node “vc” 218 and are grandchildren of the root node 212.
In step 4, the user selects one of these child nodes “env” 222 in order to reduce the total number of trees 208 further. In this example, the user selects the node value “baseEnv” 224 in column 206 for node “env” 222. This selection results in the query tree of step 5 in column 204, which reveals possible child nodes for the node “env” when constrained to value “baseEnv” 224. The user then further constrains the query by selecting the node “finObject” 226, which currently has the unbound value “?.” The user then selects the value “trade” 228 in choice column 206.
Turning now to
At any step in column 202, the user can select any of the nodes displayed in the query tree for which a node value is to be viewed in tabular format in the data pane 104 of
Turning now to
The navigation example is continued at
The query tree can also be edited without changing its structure, as in a traditional query-by-example application. For example, the user can select the “businessDate” node 214 (step 2,
Tree Comparisons
Queries and updates make use of a pattern matching operation in accordance with another aspect of this invention. The pattern matching operation takes two trees as input and compares corresponding subtrees in each. Special node values Unbound (written “?”) and Undefined (written “_”) indicate allowable values and allowable subtree structure for any node having the special value.
The special value Unbound indicates any value and any subtree structure, whereas Undefined indicates no value and no subtree structure. Unbound generalizes all nodes having the same relative position and Undefined specializes all nodes having the same relative position. In other words, if a node “businessDate” of a query tree has an unbound node value, and a tree “X” in the collection of trees has a node “businessDate” with a node value of “14 Jul. 2003” in the same relative position as the node “businessDate” in the query tree, the node in the query tree “generalizes” the node in tree X. On the other hand, if the node “businessDate” in the query tree has an undefined special value, the node in the query tree “specializes” the node in tree X. A tree is complete if it has no special values.
Having two special values makes it possible to define a number of tree constructors as binary operations on trees. For example, a tree whose leaves are all special values can be used as a mask in the following manner. The mask operation takes a target tree and a mask and returns the tree created by removing all the subtrees in the target that correspond to undefined nodes in the mask. The unmasked part of the target may be identical in structure to the mask, or it may correspond to an unbound node in the mask. This means that a single mask may be used on a large variety of targets.
The unbound value is used to represent the parts of the mask where no constraint is applied to the substructure in the target; whereas the undefined value is used to represent those parts of the mask, where all substructures are to be removed from the target.
The use of two special values allows the user to distinguish the two cases where:
-
- 1. the substructure is unconstrained; and
- 2. the substructure is constrained to be empty.
In the first case, any grammatical substructure is allowed. In the second case, there is no substructure.
Four trees are illustrated in
As illustrated in Tree 2, curve 806 may have the value “irCurve” 820. Curve 806, when constrained to the value irCurve 820, has child node “ccy” 823, as illustrated in Tree 2. Node ccy 823 is illustrated herein as having the value “USD.”
Curve 806 may also have the value “irSwaptionVol” 824, as illustrated in Tree 3 and Tree 4. Curve 806 when constrained to the value “irSwaptionVol” 824 has a child node “ccy” 822 and “index” 826. Ccy 822 and Index 826 is illustrated as having a values of “USD” and “SWAP,” respectively.
As shown in Tree 2, point 808 maybe have the value “curvePoint” 828. When node point 808 is constrained to curvePoint 828, it has a child node “maturity” 830. In the example of Tree 2, maturity 830 has the value of “1Y.” As shown in Tree 4, point 808 may also have the value “irVoIPoint” 832, which has the child nodes maturity 830, with value “1Y” and “tenor” 834 having an unbound special value.
Continuing with
A node having an Unbound value generalizes any bound node, so each unbound node in Tree 1 either generalizes or is equal to the corresponding node in Tree 2. In this case, we say that Tree 1 “generalizes” Tree 2, because the nodes of Tree 1 having the same relative position as the nodes of Tree 2 are either unbound or equal to the corresponding nodes in Tree 2. Equal in this context means equal in value and equal in subtree structure (or substructure). For instance, curve 806 and point 808 are unbound in Tree 1, and tweakSpec 810a in Tree 1 is equal in value and substructure to the tweakSpec 810a node in Tree 2, both of which occupy the same relative position in each tree.
Stated the opposite way, Tree 2 “specializes” Tree 1 because Tree 2 contains no unbound values and because the nodes of Tree 2 specify values and subtree structure for corresponding unbound nodes in Tree 1 or have values and subtree structure that are equal to the corresponding nodes in Tree 1.
Continuing in
Now compare Tree 2 and Tree 3 in
Finally, compare Tree 3 and Tree 4 in
In summary:
-
- All nodes specialize corresponding Unbound nodes“?”;
- Undefined nodes “_” specialize all corresponding nodes;
- All nodes generalize corresponding Undefined nodes “_”; and
- Unbound nodes “?” generalizes all corresponding nodes.
Given the definitions of specialize and generalize for corresponding nodes, specialize and generalize may further be defined for any two trees, t1 and t2:
-
- 1.) t1 specializes t2 iff every node in t1 specializes, or is equal to, the corresponding predicate in t2;
- 2.) t1 generalizes t2 iff every predicate in t1 generalizes, or is equal to, the corresponding predicate in t2;
- 3.) the relation specialize is the inverse of generalize; for any two trees t1 and t2, t1 specializes t2 iff t2 generalizes t1; and
- 4.) the relations are transitive. For instance, if t1 specializes t2 and t2 specializes t3, then t1 specializes t3. On the other hand, if t1 generalizes t2 and t2 generalizes t3, then t1 generalizes t3.
Tree Operations
Having defined specialize and generalize, the query operation may now be defined in accordance with another aspect of this invention. The query operation takes a query tree (or partial tree) as input and returns all the trees in a given forest that specialize or are equal to, the query tree.
There are two other important operations on trees, called intersect and extend. Each operation takes two trees as input and returns a single tree as output. Both operations work by comparing corresponding nodes (nodes having the same relative position) in the two input trees. The value of each node may be a data type (e.g., a string, float, or a user-defined data type), or it may be one of the special values unbound or undefined. For each pair of corresponding nodes, the input types determine the result. Table 1 presents the rules that define intersect, wherein “v” represents a value of type data type (i.e., not a special value), and nodes 1 and 2 are corresponding nodes in the two input trees. The result node is the node of the output tree that corresponds to nodes 1 and 2 of the input trees.
In words:
-
- if either of the nodes has an Undefined value, the result is a node with an Undefined value;
- if one of the nodes has an Unbound value and the other has value v, the result is a node with value v;
- if both nodes have different values, v1 and v2, the result is a node with an Undefined value; and
- if both nodes have the same value, v, the result is a node with value v.
As illustrated in Table 2, the rules that define extend are similar. In fact, extend is the dual of intersect, where “_” and “?” are interchanged.
In words, the rules for extend are:
-
- if either of the nodes is Unbound, the result node is Unbound;
- if one of the nodes is Undefined and the other has value v, the result node has value v;
- if both nodes have different values, v1 and v2, the result node is Unbound;
- if both nodes have the same value, v, the result node has value v.
The operation intersect is symmetrical, so - intersect(t1,t2)=intersect(t2,t1)
Also, there is a close relationship between intersect and specialize. For example, - if
- t3=intersect(t1,t2)
- then
- t3 specializes t1
- and
- t3 specializes t2
More generally, any tree that specializes either of t1 and t2 will also specialize t3. The result of intersect is like the highest common factor of the two inputs: it is the most general tree that specializes both of the inputs. Similarly, the following relationships hold for extend and generalize.
-
- extend(t1,t2)=extend(t2,t1)
And, - if
- t3=extend(t1,t2)
- then
- t3 generalizes t1
- and
- t3 generalizes t2
- extend(t1,t2)=extend(t2,t1)
The result of extend is similar to the lowest common denominator of the two inputs: it is the most specialized tree that generalizes both of the input trees.
In
Turning now to
The examples of
Query trees may be generated by applying a mask to a data tree. In particular, the query tree is generated when an extend operation is performed using a data tree (Tree 1,
When the mask (Tree 2,
These examples are intended to illustrate the use of trees to specify operations on complex data. They are based on use cases from the field of financial risk management. The tree structures are based on complex, real-time models, but they have been simplified to make the examples clearer. Further, these examples have been chosen to illustrate operations on trees, rather than to illustrate best practice in financial modeling.
Update
As discussed above, a database update is typically implemented as a deletion followed by an addition. Conceptually, the deletion will remove all existing trees that are in some way equivalent to those that are to be added. However, the exact nature of the equivalence often depends on the context of the application. For example, consider the following two update use cases for risk results:
-
- (a) all the risk points for a given trade are updated, and replaced by a new set; and
- (b) a single risk point for a given trade is updated, leaving all other risk points unchanged.
Note that in general (a) is not equivalent to repeated application of (b), because the new set may include different risk indicators or have a different number of elements.
The application developer needs to specify a set of query patterns that will select all the trees to be deleted in each use case. The problem is that the individual trees in the set of updates may all have different structures, and, therefore, the required query is different for each of the possible structures.
The solution to this problem is to specify the update operation using a tree mask. The mask makes use of undefined and unbound nodes. When the mask is applied to a data tree, the parts of the data tree corresponding to undefined parts of the mask are left unchanged, whereas the parts of the data tree corresponding to unbound parts of the mask become unbound. This results in a query tree with constrained nodes matching the parts of the data tree corresponding to the undefined parts of the mask. A single mask may generate many different query patterns depending upon the data tree it is applied to. This process will be explored in detail in the following two use cases.
Update: Use Case 1
For purposes of describing this use case scenario, assume a database containing a collection of tree data structures. Also assume that the constraint pane 102 (
The query tree of
Vc 218 is constrained to value “vc” 220 and has child node “env” 222. Node env 222 is constrained to value “perturbEnv” 1412 and has child node “perturbation” 802. Node perturbation 802 is constrained to value “tweak” 804 and has three child nodes: “curve” 806, “point” 808 and “tweakSpec” 810a (as used in the examples of
Tweakspec 810a is constrained to value tweakSpec 810b, which has spec 812 as a child node. Spec 812 is constrained to value “perturbationSpec” 814, which includes “direction” 816 and “amount” 818 as child nodes, both of which are unbound.
The node “delta” 1406 is constrained to value “money” 1414, which has “amount” 1416 as an unbound child node.
Having set forth the query tree and the set of trees identified by the query tree, use case 1 will now be described. Use case 1 concerns an update operation. Suppose that risk exposure results for a portfolio of trades have been calculated and stored, but it is then decided to revalue the EUR Interest Rate Zero Curve (IRZero) exposure for one of the trades based on a different set of shift sizes (“tweak” 804 amounts). The new exposure values are based on an upward shift and they are to replace any existing EUR exposure values based on an upward shift. The existing results that were computed using a downward shift are to be left unchanged, as are results for other trades. In the original computation, the trade had exposure to three points (1Y, 2Y, 3Y) on the EUR curve, but in the new computation there is exposure to an additional point (4Y).
The newly computed exposures (“new” or “input” data) are represented as tree data structures that are to be inserted into the set of trees shown in Table 3 in place of the originally computed exposures (“old” data). Node values of the “input” data are shown in Table 4. The rows in Table 4 each represent node values of one of the trees of input data, each tree representing one new exposure. Table 4 is therefore considered a set of input data to be added to the database in place the older data they are replacing. Note the extra 4Y point.
The update involves two basic steps: delete all “old” data and replace it with the “input” data, as shown in Table 4. The “old” data includes all trees described in Table 3 having name=Trade1, ccy=EUR and direction=UP. In accordance with one aspect of this invention, a simple way to identify the trees for deletion is to define explicitly a query pattern that matches the required results. Such explicit pattern matching is shown in
By performing the query operation using the query tree of
The query tree of
A better approach is to generate the required query tree from input values, in accordance with another aspect of this invention. The required query tree can be derived using tree operators. Thus, instead of making a query tree that matches the results directly, a mask is generated that is applied to the input data in order to generate one or more query trees (or patterns). When this mask is applied to the collection of trees in the database, it identifies only those trees that must be deleted in order to complete the update. An exemplary mask is given in
When the mask of
The mask of
This time the mask of
After one or more distinct query trees have been generated, the query trees are applied to the collection of trees in the database using the query operation, previously discussed, to identify which trees are to be deleted. In the case of the query trees shown in
In the next example, the mask of
When the mask of
The generated query tree of
Update: Use Case 2
Use Case 1 dealt with updating a category of data, or a set of trees having one or more common characteristics. In contrast, Use Case 2 deals with updating a single tree in a forest. Performing a single tree update is no different than performing a group update, except that the query tree must be more specialized to focus in on only one tree in the database. Thus, a more specific query tree is generated using a mask having more nodes with undefined values.
For example, assume that the “new” data to be inserted into the database is as shown in Table 9, and that the current state of the relevant part of the database is as shown in Table 7.
In order to update properly the database with the data in Table 9, a query tree must be generated that would identify only the tree described at the last row of Table 7. A mask that would produce such a query tree is shown in
Applying the query tree of
Compact Textual Representation of Trees
In the preceding examples, unbound nodes are explicitly shown in order to make the examples clearer. However, unbound nodes do not need to be shown in the textual representation of trees. Where no value has been specified for a tree, the unbound node will be supplied by default. Thus, it would have been possible to represent the trees discussed throughout this description in a more compact textual form. For example, the mask of
The textual representation of the tree data structures used throughout this description can be used to easily interface with external applications. For instance, if the tree data structures are stored in a textual format, such as those shown in the accompanying figures, then external applications can easily search and import the data with an appropriate interface.
Exemplary Hardware Implementation
The present invention may be implemented with the hardware arrangement shown in
When the server 2404 is present, the manner of communication between the workstation 2402 and the server 2404 can be of any means known in the art, such as direct wired communication or wireless communication. The workstation 2402 may communicate with the server 2404 via a network, such as a local area network, an intranet, or the Internet, or any other network configuration as is known in the art. When a network is used to communicate between workstation 2402 and server 2404, multiple users may have access to the system. For instance, multiple workstations 2402 may be used, wherein each user has access to the UI 100 shown in
Communication between the workstation 2402 and its display apparatus, such as a monitor, occurs using methods known in the art.
Efficient Implementation of Operations on a Collection of Trees
The present invention also includes a novel tree storage technique that reduces the amount of storage required by the database to store the collection of tree data structures, and reduces response times for operations performed on the collection of tree data structures. The problems with performing operations on a large collection of trees having heterogeneous data are that these types of data structures are often very expensive to store and process in terms of storage capacity and response time. Pairwise operations on trees, such as intersect and extend require a traversal of the two trees, where nodes of each of the input trees are matched, paired, and transformed. The traversal is performed recursively until the leaves of the tree are reached. When an operation is applied across a collection of trees, the traversal must be performed for every pairing of trees.
Further, each tree structure is composed of many nodes and arcs. This structure can impose a heavy implementation cost in terms of space and time. The large storage requirements result from the cost of creating and copying tree structures because each node has an independent copy of its children.
The novel technique of the present invention, described with reference to
Further, the leaf nodes for each tree 2510 are extracted from each tree 2504 and separately stored as sets of leaves 2512. Each set of leaves 2512 correspond to the leaves from one tree 2510. Accordingly, instead of storing the complete tree structure, including leaf nodes, for every tree 2504 in the collection of trees, all that is stored are the set of patterns 2508 and the sets of leaves 2512, thereby reducing required storage amounts for the database.
This technique also decreases response time for operations performed on the collection of trees. Having stored the patterns 2508 and leaves 2512 separately, many tree operations can be decomposed into an operation on the patterns 2506 and an operation on the leaves 2512. The operation on a pattern 2506 need only be performed once for the set of trees to which the pattern corresponds to, rather than once for each tree in the corresponding set. If an operation on a pattern 2506 excludes a set of trees 2502, significant processing time is saved, thereby decreasing response time.
Response time is further reduced by this technique because the arrangement of
It is to be understood that the above-described embodiment is merely illustrative of the present invention and that many variations of the above-described embodiment can be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that such variations be included within the scope of the following claims and their equivalents.
Claims
1. A method for navigating a collection of tree data structures stored in a computer-readable database, the method comprising:
- constraining a first node of a query tree stored in a computer-readable memory to a first value;
- making accessible a first set of nodes of the query tree that are connected to the first node constrained to the first value;
- constraining a second node in the first set of nodes to a second value;
- identifying a tree in the collection of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value; and
- accessing data in a select node of the identified tree.
2. The method of claim 1 wherein the select node is the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree.
3. The method of claim 1 further comprising:
- making accessible a second set of nodes of the query tree that are connected to the second node constrained to the second value.
4. The method of claim 3 wherein the select node is equal in position to the first node of the query tree, the second node of the query tree, or a node in the accessible first or second sets of nodes of the query tree.
5. The method of claim 1 wherein the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
6. The method of claim 1 wherein a structure of the query tree is determined by available tree structures in the collection of tree data structures.
7. In a computer system having a graphical user interface including a display device and one or more input devices, a method for navigating a collection of tree data structures stored in a computer-readable database, the method comprising:
- receiving a first value from the one or more input devices to which a first node of a query tree stored in a computer-readable memory is constrained;
- displaying with the display device a first set of nodes of the query tree that are connected to the first node constrained to the first value;
- identifying a tree in the collection of tree data structures that contains a first matching node equal in position to the first node and equal to the first value; and
- displaying with the display device data in a select node of the identified tree.
8. The method of claim 7 wherein the select node is the first matching node or a node connected to the first matching node of the identified tree.
9. The method of claim 7 further comprising:
- receiving a second value from the one or more input devices to which a second node in the first set of nodes is constrained; and
- displaying with the display device a second set of nodes of the query tree that are connected to the second node constrained to the second value.
10. The method of claim 9 wherein identifying the tree identifies a tree in the collection of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value.
11. The method of claim 10 wherein the select node is the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree.
12. The method of claim 9, wherein identifying the tree identifies a plurality of trees in the collection of tree data structures that contain (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value, and wherein displaying the data in the select node displays data in a plurality of select nodes of each of the identified plurality of trees.
13. The method of claim 12 wherein each of the plurality of select nodes are the first matching node, second matching node, or a node connected to the first or second matching nodes of the respective identified trees.
14. The method of claim 12 wherein each of the plurality of select nodes are equal in position to the first node of the query tree, the second node of the query tree, or a node in the first or second sets of nodes of the query tree.
15. The method of claim 12 wherein displaying the data in the plurality of select nodes displays, with the display device, the data of the plurality of select nodes in a tabular format.
16. The method of claim 12 further comprising:
- displaying the query tree in a constraint pane,
- wherein the displaying of the first set of nodes is displayed in the constraint pane,
- wherein the displaying of the second set of nodes is displayed in the constraint pane, and
- wherein the displaying of the data in the plurality of select nodes displays the data in a data pane.
17. The method of claim 9 wherein the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
18. The method of claim 7 wherein a structure of the query tree is determined by available tree structures in the collection of tree data structures.
19. A system for navigating a collection of tree data structures, the system comprising:
- a database component operative to maintain a database of tree data structures;
- a memory component operative to store a query tree;
- an input component;
- a display component; and
- a processing component communicatively connected to the database component, the memory component, the input component, and the display component, the processing component programmed to perform actions comprising: interpreting a first signal from the input component as an instruction to constrain a first node of the query tree to a first value; constraining the first node of the query tree to the first value; transmitting an instruction to the display component to display a first set of nodes of the query tree that are connected to the first node constrained to the first value; communicating with the database component to identify a tree in the database of tree data structures that contains a first matching node equal in position to the first node and equal to the first value; and transmitting an instruction to the display component to display data in a select node of the identified tree.
20. The system of claim 19 wherein the select node is the first matching node or a node connected to the first matching node of the identified tree.
21. The system of claim 19 wherein the processing component is programmed to perform actions further comprising:
- interpreting a second signal from the input component as an instruction to constrain a second node in the first set of nodes to a second value;
- constraining the second node to the second value; and
- transmitting an instruction to the display component to display a second set of nodes of the query tree that are connected to the second node constrained to the second value,
- wherein communicating with the database component communicates with the database component to identify a tree in the database of tree data structures that contains (1) a first matching node equal in position to the first node and equal to the first value, and (2) a second matching node equal in position to the second node and equal to the second value.
22. The system of claim 21 wherein the select node is the first matching node, the second matching node, or a node connected to the first or second matching nodes of the identified tree.
23. The system of claim 21 wherein the select node is equal in position to the first node of the query tree, the second node of the query tree, or a node in the first or second set of nodes of the query tree.
24. The system of claim 21 wherein the first value and the second value are selected from the group consisting of a data value, an unbound special value, and an undefined special value.
25. The system of claim 19 wherein a structure of the query tree is determined by available tree structures in the collection of tree data structures.
Type: Application
Filed: Mar 17, 2004
Publication Date: Mar 24, 2005
Inventors: David Ziemann (London), John Samuel (London)
Application Number: 10/802,710