Methods, apparatus and data structures for providing a uniform representation of various types of information
Methods and apparatus for analyzing tasks performed by computer users by (i) gathering usage data, (ii) converting logged usage data into a uniform format, (iii) determining or defining task boundaries, and (iv) determining a task analysis model by “clustering” similar tasks together. The task analysis model may be used to (i) help users complete a task (such help, for example, may be in the form of a gratuitous help function), and/or (ii) to target marketing information to users based on user inputs and the task analysis model. The present invention also provides a uniform semantic network for representing different types of objects in a uniform way.
Latest Microsoft Patents:
§ 1.1 Field of the Invention
The present invention concerns analyzing computer-based tasks to (i) define and infer tasks and end goals from usage data, (ii) cluster similar tasks together, (iii) determine probabilities that certain tasks will be performed, (iv) determine the different ways in which users go about completing a given task, (v) use models of clustered tasks and probabilities of clustered tasks to help computer users to perform such tasks more effectively and efficiently, and (vi) target marketing information to computer users based on a task being performed. The present invention also concerns providing a uniform semantic network for representing different types of objects (or information) in a uniform way.
§ 1.2 Related Art
§ 1.2.1 Task Performance
A task may be defined as a goal achieved by performing a sequence of steps. People often rely on computers to complete tasks. Different computer applications are tailored to help people perform different tasks. For example: a word processing application may be used to generate a letter, generate a food recipe card, or generate a table of contents for a paper; a spreadsheet application may be used to determine an accounts receivable value or determine a taxable income value; a drafting application may be used to generate an organizational chart, prepare a block diagram, or layout a floor plan for a new kitchen; a database or Internet browser application may be used to find crash test results for new cars, get a stock quote, plan an evening out with a diner and a movie, or find an employee's telephone extension.
Computer applications are designed based on predictions of how likely most users will want to perform certain tasks. Special provisions (e.g., toolbars, hierarchical menus, special keyboard keys, etc.) to assist the user in performing such tasks are provided based on assumptions made by the application designer(s). Thus, for example, in the context of a word processing application, a spell checking task may be designed to be easier to perform than a bibliography generating task because it is assumed that users will more likely want to perform a spell check task than a bibliography generation task. Similarly, a keyboard may be provided with an addition (“+”) key but not an integration (“∫”) key because it is assumed that it is more likely that users will want to include a “+” sign in a document than a “∫” sign in a document. In the context of the browsing contents of an Internet “site” or “website” (i.e., an Internet resource server), a topology of the Internet site may be designed based on expected usage of (e.g., requests for) various resources. Thus, for example, reviews of newly released movies may be easier to request (or navigate to) than reviews of older movies.
Assumptions about what tasks people want to perform and how people intuitively go about performing tasks are reflected in the design of computer applications, the topology of resource servers, such as Internet sites for example, and user interface methods (such as forms and frames) used in interactive applications and resource servers. Unfortunately, once designed, a computer application is relatively fixed. Similarly, the topology of most Internet sites is relatively static. Consequently, computer applications and Internet site topologies are typically only as good as the assumptions which underlay their design. Even if the design of computer application and Internet site topologies are based on well founded assumptions about what types of tasks users will likely want to perform and how they will go about performing such tasks, such assumptions may become stale as people want to perform different tasks.
Moreover, certain tasks will often span various computer applications. For example, a task may be to generate an annual report. Generating such a report may involve entering text by means of a word processing application, determining financial figures with a spreadsheet application, and generating a block diagram using a drafting application. It is difficult for designers of individual applications to anticipate such inter-application tasks and design their applications accordingly.
In view of the foregoing problems with computer-based tools for performing various tasks, methods and apparatus for analyzing what computer users are doing—more specifically what tasks are being performed by users and how such tasks are being performed—are needed. Moreover, methods and apparatus are needed for using such task analysis to help computer users to effectively perform desired tasks.
§ 1.2.2 Marketing Information Dissemination
As discussed above, resource servers, such as Internet websites for example, permit people to access a great deal of information. In addition to their function of providing resources to computer users, Internet sites provide a new conduit for disseminating marketing information to people. Often, marketing information is closely related to the resources requested. For example, an Internet resource providing stock quotations may include an advertisement for a stockbroker, or an Internet resource providing sports scores may include an advertisement for a baseball game to be televised. However, such marketing information is related to the characteristics of the Internet resource itself, not to the task being performed by the user requesting the resources. Thus methods and apparatus for providing marketing information relevant to a task being performed are needed.
§ 1.2.3 Object (or Information) Representation
Computer users may use various types of applications and software services. The applications and software services, in turn, may use different types of stored objects (as information, data, or executable code). For example, some objects, such as relational database structures, XML (Extensible Markup Language), and RDF (Resource Description Framework), for example, may be characterized as “structured objects”. More specifically, relational databases are defined by elements structured into rows and columns of tables. XML defines trees based on containment relationships (e.g., an organization contains groups, and each of the groups contains members). Other objects, such as DCOM and JAVA runtime objects for example, may be characterized as “active objects”. Active objects may be objects that define methods and/or variables, in the object oriented language sense. Further, techniques are available (See, e.g., U.S. Pat. Nos. 5,740,439, 5,682,536, 5,689,703, and 5,581,760, each of which is incorporated herein by reference) to “expose” machine executable instructions as objects. Still other objects, such as text documents for example, may be characterized as “linear objects.” Some objects may have more that one type. For example, HTML (Hyper-Text Markup Language) documents may include linear text, and may include hyper-text links defining a hierarchical structure.
To reiterate, applications and application services are typically tailored to only those underlying object or information type(s) that are relevant to the particular application or application service. Unfortunately, it is not easy to implement inter-application services, such as analyzing tasks discussed above, which user various types of objects. Thus, a uniform representation of various types of objects (or information) would be useful.
§ 2 SUMMARY OF THE INVENTIONThe present invention provides methods and apparatus for analyzing tasks performed by computer users. First, the present invention includes methods and apparatus to gather usage data. That is, when performing tasks, users will interact with the computer and perform a number of steps (i.e., user inputs) in an attempt to complete the task. These steps (user inputs) are logged in a usage log for further analysis. Second, the present invention includes methods, apparatus, and data structures to convert logged usage data into a uniform format. More specifically, objects (e.g., machine executable instructions, various types of database resources, text files, etc.) invoked pursuant to the user inputs may be expressed with a uniform representation. The present invention defines a uniform representation which may be used and provides methods and apparatus for mapping between objects (or information) having a specific type, and the same objects (or information) expressed with the uniform representation. Third, the present invention includes methods and apparatus to determine or define task boundaries. That is, a computer user may interact with a computer to perform a number of tasks during a single session or may perform a single task over a number of sessions. Fourth, the present invention includes methods and apparatus to define task boundaries from the converted (or non-converted, uniform) usage data. Finally, the present invention includes methods and apparatus to generate a task analysis model from the defined tasks. More specifically, the present invention may function to “cluster” similar tasks together. The task model may use a limit on (a) the number of clusters, and/or (b) the distance (i.e., “dissimilarity”) between the clusters, when generating the model.
The present invention also includes methods and apparatus which use the task analysis model. First, the present invention includes methods and apparatus for designing application user interfaces such as tool bars, hierarchical menus, gratuitous help, etc. In this instance, probabilities of tasks from the task analysis model may be used to determine what tasks users will likely want to perform. Human design factors, such as how many functions users like on a toolbar or how many levels of menus they like may be used when generating the task analysis model to determine how many clusters the model should have.
The present invention also includes methods and apparatus which use the task analysis model for designing a topology of a resource server, such as an Internet website for example. As was the case with designing application user interfaces, in this instance, probabilities of tasks from the task analysis model may be used to determine what tasks users will likely want to perform. Human design factors, such as how many hyper-text links or query boxes on a single web page users like may be used when determining the topology of the resource server interface.
The present invention also includes methods and apparatus to help users complete a task based on the task analysis model. Such help, for example, may be in the form of a gratuitous help function. Basically, a run-time application will look at steps being performed is by the user and determine if such steps “belong to” a task cluster of the task analysis model. If the steps performed by the user appear to “belong to” a task cluster, the user may be provided with gratuitous help. For example, the application may communicate to the user, “It seems that you are trying to generate an annual report. May I help you complete this task?” Alternatively, when it can be established, with a requisite degree of certainty, that the user is trying to perform a particular task, the application may automatically complete that task without further input from the user or the application may guide the user through remaining steps for completing the task in an efficient manner.
Finally, the present invention includes methods and apparatus to target marketing information to users based on user inputs and a task analysis model. For example, the Internet has permitted companies to target marketing information to narrow niches of potential customers. For example, a web page providing stock quotes may advertise a stock broker, a web page providing telephone numbers may advertise a long distance telephone carrier, etc. However, the present invention permits tasks to be more generalized. For example, it may recognize that an Internet user submitting queries for a restaurant in a certain neighborhood may be planning a date including dinner and a movie. Thus, in this case, the present invention might function to provide movie advertisements along with the restaurant information resources.
§ 3 BRIEF DESCRIPTION OF THE DRAWINGS
The present invention concerns novel methods and apparatus for analyzing tasks being performed by users and for analyzing how such tasks are being performed. The present invention also concerns novel methods, apparatus, and data structures for representing various types of objects in a uniform way. The following description is presented to enable one skilled in the art to make and use the invention, and is provided in the context of particular applications and their requirements. Various modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles set forth below may be applied to other embodiments and applications. Thus, the present invention is not intended to be limited to the embodiments shown.
Below, function(s) of the present invention will be described in § 4.1. Thereafter, the structures of exemplary embodiments and exemplary methods of the present invention will be described in § 4.2. Finally, examples of operations of the present invention will be described in 4.3.
§ 4.1 Function of the Present Invention
In this section, the basic functions performed by the present invention will be introduced. The functions may be divided into functions that may be performed when a user is not performing a task (also referred to as “off-line”) and those that may be performed while the user is performing a task (also referred to as “run-time”). The off-line functions are introduced in § 4.1.1 below. The run-time functions are introduced in § 4.1.2 below.
§ 4.1.1 Off-Line Functions
There are five (5) basic off-line functions that may be carried out by the present invention. Each of the five (5) off-line functions is introduced below. First, the present invention may function to gather usage data. That is, when performing tasks, users will interact with the computer and perform a number of steps (i.e., user inputs) in an attempt to complete the task. These steps (user inputs) are logged in a usage log for further analysis. An example of this function is described in § 4.2.3.1 below.
Second, the present invention may function to convert logged usage data into a uniform format. More specifically, objects or information (e.g., software executables, various types of database resources, etc.) invoked pursuant to the user inputs may be expressed in a common manner. An example of this function is described in § 4.2.3.2 below.
Third, the present invention may function to determine or define task boundaries. That is, a computer user may interact with a computer to perform a number of tasks during a single session (a “session” may be defined as a predetermined period of activity followed by a predetermined period of inactivity) or may perform a single task over a number of sessions. Examples of this task boundary definition function are described in § 4.2.3.3 below.
Fourth, the present invention may function to determine a task analysis model from the converted (or non-converted, uniform) usage data. More specifically, the present invention may function to “cluster” similar tasks together. The task model may use a limit on (a) the number of clusters, and/or (b) the distance (i.e., “dissimilarity”) between the clusters, when generating the model. An example of this function is described in § 4.2.3.4 below.
Finally, the present invention may function as a design tool which uses the task analysis model for designing application user interfaces such as tool bars, hierarchical menus, gratuitous help, etc. In this instance, probabilities of tasks from the task analysis model may be used to determine what tasks users will likely want to perform. Human design factors; such as how many functions users like on a toolbar or how many levels of menus they like may be used when generating the task analysis model to determine how many clusters the model should have.
The present invention may also function as a design tool which uses the task analysis model for designing a topology of a resource server, such as an Internet website for example. As was the case with designing application user interfaces, in this instance, probabilities of tasks from the task analysis model may be used to determine what tasks users will likely want to perform. Human design factors, such as how many hyper-text links or query boxes on a single web page users like may be used when determining the topology of the resource server interface.
§ 4.1.2 Run-Time Functions
Having introduced off-line functions that the present invention may perform, run-time functions that the present invention may perform are now introduced.
First, the present invention may function to help users complete a task based on a task analysis model. Such help, for example, may be in the form of a gratuitous help function. Basically, a run-time application will look at steps being performed by the user and determine if such steps “belong to” a task cluster of the task analysis model. If the steps performed by the user appear to “belong to” a task cluster, the user may be provided with gratuitous help for completing that task. For example, the application may communicate to the user, “It seems that you are trying to generate an annual report. May I help you complete this task?” Alternatively, when it can be established, with a requisite degree of certainty, that the user is trying to perform a particular task, the application may automatically complete that task without further input from the user or the application may guide the user through remaining steps for completing the task in an efficient manner. Examples of these functions are described in § 4.2.3.5 below.
Second, the present invention may function to target marketing information to users based on user inputs and a task analysis model. For example, the Internet has permitted companies to target marketing information to narrow niches of potential customers. For example, a web page providing stock quotes may advertise a stock broker, a web page providing telephone numbers may advertise a long distance telephone carrier, etc. However, the present invention permits tasks to be more generalized. For example, it may recognize that an Internet user submitting queries for a restaurant in a certain neighborhood may be planning a date including dinner and a movie. Thus, in this case, the present invention might function to provide movie advertisements along with the restaurant information resources. Examples of this “task associated advertising” function are described in § 4.2.3.6 below.
§ 4.2 Structures and Methods of Exemplary Embodiments of the Present Invention
Having introduced various functions which may be performed by the present invention, exemplary embodiments of the present invention will now be described. First, exemplary environments in which the present invention may operate will be described in § 4.2.1 below. Then, exemplary processes for effecting one or more of the functions discussed above will be described, at a high level, in § 4.2.2 below. Thereafter, details of the exemplary processes for effecting the functions discussed above will be described in § 4.2.3 below.
§ 4.2.1 Exemplary Operating Environments
The client 110 includes an user interface process 112 (e.g., a graphical user interface (or “GUI”)), an input/output interface processes 114 (e.g., a serial port, a video driver, and a network interface card (or “NIC”)), and a front end application process 116 (e.g., an Internet browser, a database front end, etc.). The user interface process 112 and the front end application process 116 may communicate with each other by means of an input/output interface process 114.
The server 120 includes an input/output interface processes 122 (e.g., a bank of network interface cards and a SCSI interface) and a back end application process 124 (e.g., an Internet resource server, a database manager, etc.). Stored objects and/or resources 126 may be accessed by the back end application process 124 by means of an input/output interface process 122 (e.g., the SCSI interface).
Thus, a user at the client 110 may access stored objects and/or resources 126 at the server 120 by means of the user interface process 112 (e.g., a GUI), a input/output interface process 114 (e.g., a serial port), the front end application process 116 (e.g., an Internet browser), an input/output interface process (e.g., a NIC), the network 130 (e.g., the Internet), an input/output interface process 122 (e.g., a NIC), the back end application process 124 (e.g., an Internet resource server), and an input/output interface process 122 (e.g., a SCSI port). As will be discussed below, processes for effecting one or more of the functions of the present invention may be carried out at the client 110 and/or at the server 120.
With reference to
A number of program modules may be stored on the hard disk 223, magnetic disk 229, (magneto) optical disk 231, ROM 224 or RAM 225, such as an operating system 235, one or more application programs 236, other program modules 237, and/or program data 238 for example. A user may enter commands and information into the personal computer 220 through input devices, such as a keyboard 240 and pointing device 242 for example. Other input devices (not shown) such as a microphone, joystick, game pad, satellite dish, scanner, or the like may also be included. These and other input devices are often connected to the processing unit 221 through a serial port interface 246 coupled to the system bus. However, input devices may be connected by other interfaces, such as a parallel port, a game port or a universal serial bus (USB). A monitor 247 or other type of display device may also be connected to the system bus 223 via an interface, such as a video adapter 248 for example. In addition to the monitor, the personal computer 220 may include other peripheral output devices (not shown), such as speakers and printers for example.
The personal computer 220 may operate in a networked environment which defines logical connections to one or more remote computers, such as a remote computer 249. The remote computer 249 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and may include many or all of the elements described above relative to the personal computer 220, although only a memory storage device 250 has been illustrated in
When used in a LAN, the personal computer 220 may be connected to the LAN 251 through a network interface adapter (or “NIC”) 253. When used in a WAN, such as the Internet, the personal computer 220 may include a modem 254 or other means for establishing communications over the wide area network 252. The modem 254, which may be internal or external, may be connected to the system bus 223 via the serial port interface 246. In a networked environment, at least some of the program modules depicted relative to the personal computer 220 may be stored in the remote memory storage device. The network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
§ 4.2.2 High Level Diagrams of Processes
Having described a number of environments within which the present invention may operate, exemplary processes for performing one or more of the functions of the present invention will now be introduced with reference to
§ 4.2.2.1 Off-Line Processes
Application(s) process(es) 310 may effect a computer application such as an Internet browser or a word processor for example. Referring to
As shown in
Since different types of stored objects (or information) 312 may be used by, and/or updated or generated by, one or more application(s) process(es) 310, a uniform object (or information) representation generation process 330 may be used to generate an object usage log having a uniform (universal) format 332. This process 330 will be described in detail later, with reference to
During a given session, more than one task may be performed or attempted. Moreover, one task may be performed over more than one session. Again, each task may have a number of steps. Thus, a task boundary determination process 340 uses task boundary model parameters 349 to define task boundaries within a session(s). Examples of this process 340 will be described in detail in § 4.2.3.3 below. The defined tasks are stored as usage task data 342. The usage task data 342 may include records 344, each of which include an optional user ID field 345, a sub-a-ERD field 346, an optional time/date stamp field 347, and a task ID field 348. The user ID field 345 of the usage task data records 344 corresponds to the user ID field 325 of the object (or information) usage log records 324 and the user ID field 335 of the object (or information) usage log in universal format records 334. The sub-a-ERD field 346 of the usage task data records 344 corresponds to the sub-a-ERD field 336 of the object usage log in universal format records 334. The time stamp field 347 of the usage task data records 344 corresponds to the time stamp field 337 of the object (or information) usage log in uniform format records 334 and the time stamp field 327 of the object (or information) usage log records 324. Finally, the task ID field 348 is generated by the task boundary determination process 340. To reiterate, examples of this process 340 will be described in § 4.2.3.3 below.
As discussed above, one of the functions which may be carried out by the present invention is to generate a task analysis model in which tasks are clustered, sequenced, and assigned probabilities. The task analysis process 350 performs one or more of these functions based on the usage task data 342 and tunable parameters 359, to generate a task model 352. As shown, the task model 352 may include records 354 having a task ID field 355 and a cluster ID field 356, as well as records 357 having a cluster ID field 356 and a cluster probability field 358. The task ID fields 355 of the records 354 of the task model 352 correspond to the task ID fields 348 of the records 344 of the task usage data 342. The records 354 may also include sub-a-ERD fields 353 which correspond to the sub-a-ERD fields 346 of the records 344 of the usage task data 342. Typically, each cluster will have one or more associated tasks.
§ 4.2.2.2 Run-Time Processes
Having provided an overview of off-line processes which may be carried out in accordance with the present invention, run-time processes which may be carried out by the present invention are now introduced with reference to
The task help content storage 395 may include records 396, each having a cluster ID field 397 and a task help content field 398. The task help content may be scripts, queries, executable objects, etc., designed to help a user perform a given task. The task help content field 398 may include the task help content itself or, may include an address(es) of a location(s) at which the task help content is stored.
The task model 352, together with marketing information content 390, may be used by a task based advertising process 380 to retrieve appropriate marketing information content 390 and present such content to a user via the user interface process 360. This process 380 is described in § 4.2.3.6 below.
The marketing information content storage 390 may include records 392, each having a cluster ID field 393 and a marketing information content field 394. The marketing information content may be image, audio, video, and/or text files which, when rendered, convey marketing information. The marketing information content field 394 may include the marketing information content itself or, may include an address(es) of a location(s) at which the advertising content is stored.
§ 4.2.3 Details of Processes
Having introduced the processes which the present invention may perform with reference to
§ 4.2.3.1 Object Log Process
Recall from the description of
The object usage log process 320 creates an object usage log 322 based on stored objects used by the application(s) process(es) 310.
As shown in step 440, at the end of a predetermined time period since the last user input (e.g., a day, a week, etc.), the time period is reset in step 450 and sessions are determined and assigned to the object ID values based on the saved user ID values and time/date stamp values in step 460. To reiterate, a session is defined as a period of activity (e.g., by a given user, or at a given computer) followed by a period of inactivity (e.g., by the given user, or at the given computer). Next, as shown in step 470, records including object ID and session ID (and optionally user ID and time/date stamp) information are stored. Processing then continues via return node 480.
§ 4.2.3.2 Uniform (Universal) Object Representation Process
As discussed above with reference to
Below, § 4.2.3.2.1 introduces different types (e.g., structured, active, and linear) objects (or information). Then, advantages of representing various type of objects (or information) in a uniform way are discussed in § 4.2.3.2.2 below. Thereafter, an exemplary architecture in which the uniform representation of the present invention, as well as the task analysis engine of the present invention, are depicted is described in § 4.2.3.2.3 below. Next, an exemplary uniform representation, namely annotated ERDs, is described in § 4.2.3.2.4 below. The ways in which various types of objects (or information) are mapped to a uniform representation is described in § 4.2.3.2.5 below. Finally, certain aspects of the uniform representation are described in § 4.2.3.2.6 below.
§ 4.2.3.2.1 Types of Objects
Some objects, such as relational database structures, XML (Extensible Markup Language), and RDF (Resource Description Framework), for example, may be characterized as “structured objects”. More specifically, relational databases are defined by elements structured into rows and columns of tables. XML defines trees based on containment relationships (e.g., an organization contains groups, and each of the groups contains members). In general, structured objects may be characterized as information having elements arranged in a regular organization. Typical structures used in information systems are reviewed in the text: Aho et al, Data Structures and Algorithms.
Other objects, such as DCOM and JAVA runtime objects for example, may be characterized as “active objects”. Active objects may be “objects”, in the object oriented language sense of the term. That is, objects consist of code which can change the state (or variables) of the object as a result of computations performed by a computer on behalf of an application or computer user. The code of an object makes the information “active” since the execution of the code can change the state information, independently of the representation itself. Further, techniques are available (See, e.g., U.S. Pat. Nos. 5,740,439, 5,682,536, 5,689,703, and 5,581,760, each of which is incorporated herein by reference) to “expose” machine executable instructions as objects.
Still other objects, such as text documents for example, may be characterized as “linear objects.”Linear objects (or information) are typified by a text stream, which is a linear arrangement of bytes. Linear information may also be encoded into a binary representation. Linear information may include in-line tags which divide the linear stream into segments. An example is a markup language, such as HTML, which inserts tags delimiting the text stream into paragraphs, font runs, and style elements.
Some objects may have more that one type. For example, HTML (Hyper-Text Markup Language) documents may include linear text, and may include hyper-text links defining a hierarchical structure.
§ 4.2.3.2.2 Advantages of a Uniform Object (or Information) Representation
Mapping different types of objects (or information) into a uniform representation has a number of advantages. First, instead of requiring different computational processes for the different types of objects (or information), computation or inference can occur uniformly over different types of information when a uniform representation is used. The results of such a computation can then be “mapped back” into a particular type of object (or information) such that processes intrinsic to that type of object can use the results. Thus, by permitting different types of objects (or information) to be mapped to a uniform representation and a uniform representation to be mapped back to a particular type of object (or information), a wide variety of application or user information may be shared between computational processes. Such computational processes may be of uniform construction, while particular object (or information) class information (e.g., linear, active, or structured) need not be dictated to the applications or users. The task analysis methods of the present invention are examples of such computational processes.
§ 4.2.3.2.3 Exemplary Software Architecture
Referring back to
§ 4.2.3.2.4 Exemplary Uniform Relationship (Annotated ERDs)
In the following, an annotated ERD representation of objects (or information) is described. First, an overview of the known ERD semantic representation of databased data is presented in § 4.2.3.2.4.1. Then, a description of the annotated ERD representation, as well as some of its properties, is described in § 4.2.3.2.4.2.
§ 4.2.3.2.4.1 ERDs
To reiterate, the a-ERD (or annotated-Entity Relationship Diagram) format 524 provides a uniform way to gather and use different types of objects. The a-ERD 524 has a “vocabulary” and a “syntax”. The a-ERD vocabulary is defined by symbols. The a-ERD syntax defines rules for expressing objects as a graph structured in the a-ERD format. Basically, the a-ERD format 524 uses a sub-a-ERD (or “sub-graph or an annotated-entity relation diagram”) structure to express objects. Although ERDs are known to those skilled in the art, they are discussed below for the readers' convenience.
ERDs provide a semantic model of data in a database. Semantic modeling permits a database to (i) respond more intelligently to user interactions, and (ii) support more sophisticated user interfaces. ERDs were introduced in the paper, Peter Pin-Shan Chen, “The Entity Relationship Model-Toward a Unified View of Data,” International Conference on Very Large Data Bases, Framingham, Mass., (Sep. 22-24, 1975), reprinted in Readings in Database Systems, Second Edition, pp. 741-754, edited by in Michael Stonebraker, Morgan Kaufmann Publishers, Inc., San Francisco, Calif. (1994) (hereafter referred to as “the Chen paper”).
Basically, the Chen paper defines an “entity” as a thing that can be distinctly identified. A “weak entity” is defined as an entity whose existence depends on some other entity. An entity may have a “property” or an “attribute” which draws its value from a corresponding value set. A “relationship” is an association among entities. Entities involved in a given relationship are “participants” in that relationship. The number of participating entities in a relationship defines the “degree” of the relationship. In entity relationship diagrams, entities are depicted with rectangles, properties are depicted with ellipses, and relationships are depicted with diamonds.
Exemplary entity relationship diagrams are shown in
In relation 600a, a restaurant ID number is associated with a particular restaurant and the cuisine type ID number is associated with a particular cuisine type. For example, restaurant ID number 4 corresponds to McDonalds. The following table lists exemplary cuisine types and associated ID numbers.
Although not shown in the relations, each restaurant may have other attributes such as a star rating (e.g., *, **, ***, ****, or *****), a cost rating (e.g., $, $$, $$$, $$$$, or $$$$$) and special options (e.g., Goof Deal, Child Friendly, New, Romantic, 24-Hour, Afternoon Tea, Brunch, Delivery, Late Night, Live Entertainment, Noteworthy Wine List, Outdoor Seating, Pre-Theater Menu, Prix Fixe, Smoke Free, Smoke Friendly, View, etc.)
In the relation 600b, a neighborhood ID number is associated with a particular neighborhood and the person/place ID number is associated with a person or place. For example, neighborhood ID number 14 corresponds to the “Financial District” neighborhood of New York City, The following table lists exemplary New York City neighborhoods and associated ID numbers.
Executable software objects may also be expressed in a computer program application relation. For example, referring to
§ 4.2.3.2.4.2 Annotated ERDs
One problem with the entity relationship diagram model of database design is that it is subjective, as is apparent from the entity relationship diagrams depicted in
The a-ERD structure of the resource description format 524 removes such subjectivity from semantic representations of data (or objects). For example, in an ERD, a restaurant entity may have a cuisine type property. On the other hand, in an a-ERD, a restaurant entity may participate in a “has a” relationship with a cuisine type entity, and the cuisine type entity may participate in an “is served at” relationship with a restaurant entity. Basically, the a-ERD structure functions to (i) convert all attributes to entities by means of a “has a” relation, for example, (ii) permit relationships on relationships (e.g., a “location of” is an “attribute of”) or “n-ary” relationships, (iii) annotate the relations with text, and (iv) permit computed relationships. Each of these functions will be discussed below.
Thus, the annotated ERD uniform representation may be thought of as a collection of “elements”. Each element may have an (i) optional “label” which names the element (and may be non-unique), (ii) an optional “identifier” which uniquely identifies the element, and (iii) an optional value.
In the ERD vernacular, an element is either an entity or a relation. (See, e.g.,
An a-ERD representation may be expressed in two (2) ways—as a list of predicates or as a directed hypergraph. For example,
§ 4.2.3.2.5 Mapping Various Types of Objects (or Information) to a Uniform Representation
As mentioned above, various types of objects (or information) may be mapped to a uniform representation. Examples of such mapping processes are presented below.
Tabular, graph, or hierarchical (e.g., tree) structures can all be mapped to a graph. First, as shown in
As shown in
Finally, as shown in
Tables of a relational database may be mapped to a hypergraph as follows. First, regarding the conversion of all attributes to entities, recall that in
Similarly, recall that in the ERD 700b of
Finally, recall that in the ERD 900a of
Although the a-ERD format 526 was described with reference to graphs in
-
- rendered by, at (internet resource, user, time)
where the entities are provided in parenthesis and the relationships precede the entities. Similarly, the a-ERD ofFIG. 8A may be represented as: - offers (restaurant, special options), has a (restaurant, rating/cost/cuisine type)
and the a-ERD ofFIG. 8B may be represented as: - is in a (person/place, neighborhood).
- rendered by, at (internet resource, user, time)
Each application process 310 may be represented by a full a-ERD. If, for example, the application is a word processor, the a-ERD may denote the relationship(s) among (executable software) “object” entities. If, on the other hand, the application is a resource browser, the a-ERD may denote the relationship(s) among databased resources.
Linear objects (or information) may be mapped to a hypergraph representation by providing a “precedes” or “follows” relationship, or a “preceeds/follows” bi-directional relationship between pieces (e.g., words) of the linear information. For example, referring to
Finally, active objects (or information) may be mapped to a hypergraph representation. In the following description, two (2) types of active objects (or information) are considered. The first type is an object with both properties (or variables) and methods. The second type is an object with methods but no properties (or variables), also referred to as code.
The first type of active object, that is, one with both properties (or variables) and methods, may be mapped to a hypergraph representation as follows. First, an entity is created for each property (or variable) of the object, as well as for the object itself. Then relations that relate the property (or variable) entities to the object entity are created. For example, referring to
For the second type of active object, that is, object methods with no corresponding properties (or variables), each method is mapped to a set to a set of entities that represent input and output parameters of the method. Appropriate relations are created between such entities. Finally, a container (or parameter list) is built for all of the entities. For example, referring again to
Note that all properties (or variables) and methods of an object need not be mapped to the uniform representation. For example, referring to
Note that mapping objects (or information) to the uniform representation may result in inefficient representations. For example, referring back to
§ 4.2.3.2.6 Other Aspects of the Uniform Representation
The uniform representation of the present invention can also handle intentional and extensional definitions. As shown in
The uniform representation of the present invention can also handle incremental attribution. That is, the uniform representation has been designed with the understanding that knowledge in the representation may be incomplete. For example, statements (or code) such as:
-
- marriage(A,B){circumflex over ( )}husband(A){circumflex over ( )}wife(B)
may be made by later attributed as a “heterosexual marriage” as opposed to a “homosexual marriage” as circumstances (e.g., laws) or applications change. This can be done through a contextual containment, that is, using containment as a context. More specifically, under certain contexts, the original statement is still valid, though it may be incorrect or incomplete. For example, if Hawaii recognizes homosexual marriages, the following statement (or code): - context-of(marriage(a,b){circumflex over ( )}partner(a){circumflex over ( )}partner(b), Hawaii)
is appropriate. The same mapping techniques described above may be used to map between contexts.
- marriage(A,B){circumflex over ( )}husband(A){circumflex over ( )}wife(B)
The uniform representation of the present invention can handle ambiguity. Many predicate logic based systems, such as deductive databases or deductive object oriented databases for example, require logical consistency in the database. Thus, for example, in such applications, facts such as “color(A, Red)” and “color(A, Blue)” can't exist in the database if only one color is permitted for A. In particular, this will result in both of the following to be true: color(A, Red) and color(A, ˜Red) (where˜is the logical NOT), which is a logical contradiction. The uniform representation of the present invention does not constrain knowledge to a particular logical formalism. Accordingly, both predicates may be simultaneously represented, notwithstanding the fact that they may define a logical contradiction. It is left to other computational processing to disambiguate these statements, possibly by searching for other contextual information (as illustrated below), or waiting for additional attribution as noted above (as illustrated below). For example, referring to
The uniform representation of the present invention handles multiple attribution. Since the uniform representation of the present invention handles ambiguity, incremental attribution, and multiple contexts, as described above, different applications with different “points of view” can add their attributes into the uniform representation. For example, referring to
The a-ERD format permits relations on relations. Referring, for example, the a-ERD of
The a-ERD format 524, the relationships are annotated with text using the vocabulary of the a-ERD format (e.g., “has a”, “is a”, “belongs to”, etc.). Basically, in ERDs, the text is typically for use by humans when designing a database or database application—the database or database application itself does not use the text. This is not the case with the a-ERD format.
The a-ERD format 524 permits computed relations. For example, referring to
The foregoing features enable the uniform representation of the present invention to handle real world cases of natural language query, where users make ambiguous statements in context, as well as applications having different “world views”. More formal representation systems are useful for very specific reasoning, but are too fragile for real world use.
Another uniform object format is Unified Modeling Language (or “UML”) which is used by Repository from Microsoft Corporation of Redmond Washington. Yet another uniform object format is Meta Content Format (or “MCF”) from Apple Computer of Cupertino, Calif.
§ 4.2.3.3 Task Boundary Determination Process
Having described exemplary object log 320 and uniform representation 330 processes, exemplary task boundary determination processes 340 are now presented. Recall that the object log process defined sessions based on, for example, a period of activity followed by a period of inactivity. However, a user or users may perform more than one task in a given session or may perform only one task over a number of sessions. Thus, task boundaries should be defined. Defining task boundaries is not necessary, but it is believed that modeling task boundaries is far easier than modeling the tasks themselves. Naturally, the task boundary model used may introduce artifacts in the task analysis process.
There are a number of ways that task boundaries may be defined, some examples of which are presented below. Initially, simple, less sophisticated task boundary definition models are presented. Then, more sophisticated models are discussed.
In a first method for defining task boundaries, a task boundary is defined after an arbitrary number of user interactions. The arbitrary number may be stored as a task boundary model parameter 349. Although this model is easy to implement, it would produce a number of arbitrary boundaries assuming that different tasks require different numbers of steps by the user.
In a second method for defining task boundaries, each of a number of sub-a-ERDs are defined to correspond to a given task. A task boundary is defined whenever two consecutive user interactions use different sub-a-ERDs. The sub-a-ERDs may be stored as task boundary model parameters. The problem with this model is that it is based on predetermined assumptions of what tasks users will want to perform. Thus, the model is based on a static set of assumptions that may not anticipate tasks actually performed.
In a third method for defining task boundaries, the application process 310 has a defined a-ERD as discussed above. The a-ERD may be stored as a task boundary model parameter 349. Sub-a-ERDs are composed corresponding to user inputs (e.g., commands, queries, etc.) A task boundary is defined when two (2) consecutive disjoint sub-a-ERDs of the a-ERD are greater than a predetermined distance apart. The subject of determining the distance between sub-a-ERDs is described in § 4.2.3.4 below.
In a fourth method for defining task boundaries, like the third method, the application process 310 has a defined a-ERD, and sub-a-ERDs are composed corresponding to user inputs, as discussed above. The a-ERD may be stored as a task boundary model parameter 349. A task boundary is defined when two (2) consecutive disjoint sub-a-ERDs of the a-ERD are not joined in a query or user input. For example, sub-a-ERDs corresponding to the queries “Restaurants in the Theater District” and “Movies in TriBeCa” are disjoint and unconnected. On the other hand, sub-a-ERDs corresponding to the queries “Restaurants in the Theater District” and “Movies in the Theater District” are connected by the context of the queries (i.e., common neighborhood) or sub-a-ERDs corresponding to the queries “Romantic restaurants” and “Price of the same restaurant” are related by the context of the queries (i.e., “same restaurant”).
Other methods for defining task boundaries may use a combination of any of the above four models.
If the user's interaction is supervised or limited such that task boundaries are explicitly entered by the user or explicitly defined, the steps of defining task boundaries need not be performed.
Referring back to
§ 4.2.3.4 Task Analysis Process
Having described exemplary object log 320, uniform object representation 330, and task boundary determination 340 processes, an exemplary task analysis process 350 is now described. Referring first to
The details of the exemplary step for determining task distances is now described with reference to
where: A and B are the graphs,
-
- v is a tunable parameter >1, and
- c is the number of connected elements in the difference A-B, and
- i is the number of disjoint pieces of the difference A-B.
Other methods for penalizing connectedness in graph differences may also be used. Finally, as shown in step 1550, a final distance between the graphs is determined based on the determined intermediate distances and intersection. The final distance, d, may be defined as:
where: nINTERSECT≡the number of vertices and edges in A∩B
Processing continues via return node 1560.
In some instances, the task analysis (e.g., clustering) should be performed on sequence independent tasks. In such cases, all of the sub-a-ERDs associated with users and tasks may be united into one large sequence independent task (or “SIT”) sub-a-ERD. Pattern matching algorithms may then be used to classify and cluster the SITs.
Results of the task clustering process may be used to further abstract the a-ERD representation of the application process 310. Recall, for example, that the a-ERD representation of the application process 310 may be a task boundary model parameter 349.
Other probabilities related to the analyzed tasks may also be determined. A hypergraph of the a-ERD (or “HAG”) corresponding to the application process 310 may be defined by (i) nodes corresponding to sub-a-ERDs corresponding to steps taken (or queries made) in a defined task, and (ii) directed edges corresponding to the order of steps taken (or queries made) in the defined task. In the RAG, nodes of degree one (1) having an exiting edge are defined as “start nodes” of the task and nodes of degree one (1) having an entering edge are defined as “end nodes” of the task. Probabilities corresponding to each HAG (or task) may be determined as discussed above with reference to
§ 4.2.3.5 Task Help Process
Having described exemplary processes for performing the off-line functions of the present invention, exemplary processes for performing the run-time functions of the present invention, namely task help and task based advertising, are now described. An exemplary process for performing the task help function will be described in this section with reference to
Referring now to
The help provided may be in the form of a script (or “wizard”), a query, a hint, navigational assistance, etc. For example, in the context of a word processing application, the task help process 370′ may recognize that the user is performing steps “close to” a “generate food recipe card” task cluster. In this case, the application may prompt the user, “IT SEEMS THAT YOU ARE TRYING TO ENTER A RECIPE. WOULD YOU LIKE HELP IN FORMATTING A RECIPE CARD?” If the user replies yes, recipe card formatting help is provided. In the context of an Internet website for providing information about things to do in a particular city, the task help process may recognize that the user is performing steps “close to” a “plan a romantic date” task cluster. For example, the user may have requested romantic restaurants located at the upper east side of New York City. In this case, the Internet website may provide gratuitous information regarding romantic things to do in the same neighborhood. For example, the Internet website may convey to the user, “IT SEEMS THAT YOU MAY BE PLANNING A ROMANTIC EVENING IN THE UPPER EAST SIDE. YOU MAY CONSIDER A HORSE DRAWN CARRIAGE RIDE THROUGH CENTRAL PARK. ALSO, “THE ENGLISH PATIENT” IS PLAYING AT THE FOLLOWING MOVIE THEATERS IN THE UPPER EAST SIDE . . . . ”
To summarize, the task help process 370 basically, determines a task that a user is trying to perform, gets the associated task cluster ID from the task model 352, and uses the associated task cluster ID to find task help content 398 in the task help content storage 395. Naturally, the task help content field 398 may include an address(es) to a storage location(s) of task help content.
§ 4.2.3.6 Task Based Advertising Process
Having described an exemplary process for performing the task help function of the present invention, an exemplary process for performing the task based advertising function will be described with reference to
Referring now to
§§ 4.2.4 Data Structures and Instructions
The above mentioned processes may be carried out by machine readable instructions. Referring to
§ 4.3 Operation of the Present Invention
Examples of building object usage logs in the environments depicted in
Examples of the operation of various processes, which may be performed by the present invention, are described in the context of an Internet website for providing content in response to queries in § 4.3.3 below.
§§ 4.3.1 Building Object Usage Log Operation
Operations for building an object usage log, both in the context of the client-server environment 100 depicted in
§ 4.3.1.1 Client-Server Environment
In response to the communication 2510, the front end application process 116 forwards a request or command, in communication 2520, to the back end application process 124 via an output interface process (not shown), a network (not shown), and an input interface process (not shown). (See, e.g., elements 114, 130, and 122 of
Before, after, or concurrently with the communications 2530 and 2570, the back end application process 124 will also forward, in communication 2540, the object ID associated with the request or command of communications 2520 and 2530 to the object log process 320. In response, the object log process 320 submits, in communication 2560, the object ID and the time, to the object usage log 322 for storage. The time may be provided by a service process (not shown) of the server. The communication 2560 may also include a user ID.
§ 4.3.1.2 Desktop Environment
In response to the communication 2610, the application program management process 150 forwards a request/command, in communication 2620, to the storage management process 160. In response, the storage management process 160 submits a request/command, in communication 2630, to the stored objects/resources 312 which returns, in communication 2640, a resource (e.g., an employee record) corresponding to the request in the communications 2620 and 2630 or an object (e.g., a spell check executable software object) corresponding to the command in communications 2620 and 2630. The storage management process 160 then returns, in communication 2650, the resource (e.g., the employee record) or the object (e.g., the spell check executable software object) to the application program management process 150. Thereafter, the application management process 150 returns, in communication 2660, the requested resource or the product of the object activity corresponding to the command.
Before, after, or concurrently with the communications 2620 and 2660, the application program management process 150 will also forward, in communication 2670, the object ID associated with the request or command of communication 2610 to the object log process 320. In response, the object log process 320 submits, in communication 2680, the object ID and the time, to the object usage log 322 for storage. The time may be provided by a service process (not shown) of the server. The communication 2680 may also include a user ID.
§ 4.3.2 Run-Time Functions Operations
The operations of the run-time functions (e.g., task help and task-based advertising), both in the context of the client-server environment 100 depicted in
§ 4.3.2.1 Client-Server Environment
In response to the communication 2705, the front end application process 116 forwards a request/command, in communication 2710, to the back end application process 124 via an output interface process (not shown), a network (not shown), and an input interface process (not shown). (See, e.g., elements 114, 130, and 122 of
Before, after, or concurrently with the communications 2720 and 2730, the back end application process 124 will also forward, in communication 2715, the object ID associated with the request or command of communication 2710 to the task help process 370 and/or the task-based advertising process 380. In response to the communication 2715, the task help process 370 and/or the task-based advertising process 380 compares the received object ID(s) with one or more task clusters of the task model 352 requested in communication 2740 and accepted in communication 2745. (Note that the task help process 370 or the task-based advertising process 380 may use a run-time graph constructed based on a number of user inputs as discussed above.) If the object ID(s) (or run-time graph) correspond to a task which is “close to” a given task cluster, then the task cluster ID is used to access appropriate help content 395 and/or marketing information content 390. More specifically, the task help process 370 and/or the task-based advertising process 380 submits a request 2750, including the cluster ID, to the task help content 395 and/or the marketing information content 390, respectively. In response, the task help and/or marketing information corresponding to the cluster ID of the request 2750 is returned to the task help process 370 and/or the task-based advertising process 380 in communication 2755. The task help process 370 and/or the task-based advertising process 380 then sends the help content and/or the marketing information content to the back end application process 124 in communication 2760. The back end application process 124 then forwards the help content and/or the marketing information content to the front end application process 116 in communication 2765. Finally, the help and/or marketing information is sent, in communication 2770, to the user interface process 112 where the help and/or marketing information is rendered.
§ 4.3.2.2 Desktop Environment
In response to the communication 2805, the application program management process 150 forwards a request/command, in communication 2810, to the storage management process 160. In response, the storage management process 160 submits a request or command, in communication 2820, to the stored objects/resources 312 which returns, in communication 2825, a resource (e.g., an employee record) corresponding to the request in the communication 2820 or an object (e.g., a spell check executable software object) corresponding to the command in communication 2820. The storage management process 160 then returns, in communication 2830, the resource (e.g., the employee record) or the object (e.g., the spell check executable software object) to the application program management process 150. Thereafter, the application management process 150 returns, in communication 2835, the requested resource or the product of the object activity corresponding to the command.
Before, after, or concurrently with the communications 2810 and 2835, the application program management process 150 will also forward, in communication 2815, the object ID associated with the request or command of communication 2805 to the task help process 370 and/or the task-based advertising process 380. In response to the communication 2815, the task help process 370 and/or the task-based advertising process 380 compares the object ID(s) received with one or more task clusters of the task model 352 requested in communication 2840 and accepted in communication 2845. (Note that the task help process 370 or the task-based advertising process 380 may use a run-time graph constructed based on a number of user inputs as discussed above.) If the object ID(s) (or run-time graph) correspond to a task which is “close to” a given task cluster, then the task cluster ID is used to access appropriate help content 395 and/or marketing information content 390. More specifically, the task help process 370 and/or the task-based advertising process 380 then submits a request 2850, including the cluster ID, to the task help content 395 and/or the marketing information content 390, respectively. In response, the help and/or marketing information corresponding to the cluster ID of the request 2850 is returned to the task help process 370 and/or the task-based advertising process 380 in communication 2855. The task help process 370 and/or the task-based advertising process 380 then sends the help content and/or the marketing information content to the application management process 150 in communication 2760. Finally, the help or marketing information is sent, in communication 2870, to the user interface process 140 where the help and/or marketing information is rendered.
§ 4.3.3 Examples of Operations of Processes of the Present Invention
In the following examples, it is assumed that an Internet website includes databased information regarding restaurants and movie theaters in New York City. In the following sections,
§ 4.3.3.1 Operation do the Task Graph Generation Process
An example of the operation of the task graph generation process 1310′ of
-
- [cuisine, CTID (18), RID (*)], and [neighborhood, PPID (*), NID (21)]
- where: “cuisine” is the name of a relation (table) in the database (See, e.g., relation 600a of
FIG. 6A .);- “CTID” is a “cuisine type ID” attribute of the cuisine relation;
- “(18)” is a value—specifically “Italian”—associated with the CTID attribute;
- “RID” is a “restaurant ID” attribute of the cuisine relation;
- “(*)” is a wildcard value associated with the RID attribute;
- “neighborhood” is the name of a relation in the database (See, e.g., relation 600b of
FIG. 6B .); - “PPID” is a “Person-place ID” attribute of the neighborhood relation;
- “(*)” is a wildcard value associated with the PPID attribute;
- “NID” is a “neighborhood ID” attribute of the neighborhood relation; and
- “(21)” is a value—specifically “Little Italy”—associated with the NID attribute.
The canonical form of the second query may be:
- [cuisine, CTID (*), RID (143)]
- where: “cuisine” is the name of a relation (table) in the database (See, e.g., relation 600a of
FIG. 6A .);- “CTID” is a “cuisine type ID” attribute of the cuisine relation;
- “(*)” is a wildcard value associated with the CTID attribute;
- “RID” is a “restaurant ID” attribute of the cuisine relation; and
- “(143)” is a value—specifically “Oceana”—associated with the RID attribute.
Recall from
Recall from step 1410 of
Recall from step 1430 of
§ 4.3.3.2 Operation of the Distance Determination Process
An example of the operation of the graph distance determination process 1320′ of
Recall from step 1510 of
Recall from step 1520 of
In general, the more connected the difference graph is, the more different the queries (or graphed tasks) are. Recall in § 4.2.3.4 above that an intermediate distance between graphs is based on a sum, over all pieces of the difference graph, of tunable parameters “V” to raised to the number of connected elements “Ci” in the piece “i” of the difference graph. Thus, graphed tasks are more distant, and hence more different, as the connectedness of their differences increases. In this example:
d*(22a, 22c)=d*(22c, 22a)=v3
and
d*(22a, 22b)=d*(22b, 22a)=v2+v2
Thus, for example, if the tunable parameter v is 10, d*(22a, 22c)=d*(22c, 22a)=103=1000 while d*(22a, 22b)=d*(22b, 22a)=102+102=200.
Since, in this example, 22a∩22c is 2 (i.e., the time and place vertices) and 22a∩22b is 1 (i.e., the movie vertex), the final distance d(22a, 22c) is 1000/2=500 and the final distance d(22a, 22b) is 200/1=200.
In this example, the tasks graphed in
As can be appreciated, the tunable parameter “v” should always be larger than one. Further, the larger the value of the tunabale parameter v, the more relatively “connected” difference graphs are penalized—that is, they are made, or assumed to be, more distant.
§ 4.3.3.3. Operation of the Task Clustering Process
An example of the operation of the task IS clustering process 1220′ of
Recall from step 1610 of
Recall from steps 1620, 1630, 1640 and 1650, the process of clustering and redetermining distances continues until the distance of the least distant task(s) and/or cluster(s) is greater than a first predetermine value or, alternatively, if the number of clusters is greater than a second predetermined value. The first and/or second predetermined values are tunable parameters.
As can be appreciated from the foregoing description, the present invention teaches a tool for analyzing tasks being performed by users on a computer. A generated task analysis model may then be used to help (i) users complete a task, (ii) application program developers to design programs which help users complete popular tasks, (iii) resource server developers to design a topology or resource server to help users complete popular tasks, and (iv) advertisers target “task-relevant” marketing information to computer users.
Claims
1-18. (canceled)
19. A method for representing structured, linear, and active information in a uniform way, the method comprising steps of:
- mapping structured information to a uniform representation;
- mapping linear information to the uniform representation; and
- mapping active information to the uniform representation.
20. The method of claim 19 wherein the uniform representation is a collection of elements,
- wherein each of the elements is one of an entity and a relation, and
- wherein a relation relates two ordered elements.
21. The method of claim 20 wherein a container entity contains other elements.
22. (canceled)
23. The method of claim 20 wherein the uniform representation is expressed as a list of predicates.
24. The method of claim 20 wherein the step of mapping structured information to a uniform representation, includes sub-steps of:
- if the information is structured as a hierarchy, representing nodes of the hierarchy as entities, and generating parent/child relations between the entities, consistent with the hierarchy.
25. The method of claim 20 wherein the step of mapping structured information to a uniform representation, includes sub-steps of:
- if the information is structured as a table, representing the table as an entity, representing column names of the table as entities, representing row numbers of the table as entities, generating contains relations between the entity representing the table and each of the entities representing the column names and row numbers, generating entities corresponding to information contained in the table, and generating contains relations between the entities representing the column names and row numbers and entities corresponding to the information contained in the table, consistent with the table.
26. (canceled)
27. The method of claim 20 wherein the step of mapping structured information to a uniform representation, includes sub-steps of:
- if the information is structured as an entity-relationship diagram, representing each of any attributes as an entity, and generating has—a relations between each of the entities representing attributes and the entities to which the attributes belonged, consistent with the entity-relationship diagram.
28. The method of claim 20 wherein the step of mapping linear information to a uniform representation, includes sub-steps of:
- parsing the linear information into pieces;
- representing each of the pieces with an entity; and
- generating a proceeds/follows relation between each of the entities representing adjacent parsed pieces.
29. The method of claim 28 wherein the pieces are selected from a group consisting of words, sentences, paragraphs, sections, headings, phrases, and alphanumeric strings.
30. The method of claim 20 wherein the step of mapping linear information to a uniform representation, includes sub-steps of:
- representing a name of the sequence with an entity;
- representing the information of the sequence with an entity; and
- generating a data relation between the entity representing the name of the sequence and the entity representing the information of the sequence.
31. The method of claim 20 wherein the step of mapping active information to a uniform representation, includes sub-steps of:
- representing a name of the active information as an entity;
- representing properties, if any, of the active information as entities;
- generating an of relation between the entity representing the name of the active information and each of the entities, if any, representing the properties of the active information;
- representing methods, if any, of the active information as entities; and
- generating a to relation between the entity representing the name of the active information and each of the entities, if an), representing the methods of the active information.
32. The method of claim 31 wherein if a method gets a property, then further:
- generating a get relation between an entity representing the method and an entity representing the property.
33. The method of claim 31 wherein if a method sets a property, then further:
- generating a set relation between an entity representing the method and an entity representing the property.
34. The method of claim 31 wherein, if a method of the active information has parameters, then further:
- generating entities representing each of the parameters;
- generating a parameter list entity;
- generating a parameters—of relation between the entity representing the method and the parameter list entity; and
- generating a contains relation between the parameter list entity and each of the entities representing the parameters.
35-36. (canceled)
37. A machine readable medium having machine executable instructions which, when executed by the machine perform steps for representing structured, linear, and active information in a uniform way, the steps comprising:
- mapping structured information to a uniform representation;
- mapping linear information to the uniform representation; and
- mapping active information to the uniform representation.
38. The method of claim 37 wherein the uniform representation is a collection of elements,
- wherein each of the elements is one of an entity and a relation; and
- wherein a relation connects two ordered elements.
39-48. (canceled)
49. The method of claim 38 wherein the step of mapping active information to a uniform representation, includes sub-steps of:
- representing a name of the active information as an entity;
- representing properties, if any, of the active information as entities;
- generating an of relation between the entity representing the name of the active information and each of the entities, if any, representing the properties of the active information;
- representing methods, if any, of the active information as entities; and
- generating a to relation between the entity representing the name of the active information and each of the entities, if any, representing the methods of the active information.
50-54. (canceled)
55. A machine readable medium having a data structure comprising at least three elements, each of the elements being one of an entity and a relation,
- wherein a relation connects two ordered elements; and
- wherein an entity may represent any one of an attribute of an entity-relationship diagram; an entry from information structured as a hierarchy; a column name of a table; a row number of a table; an entry from a table; text from information structured as a sequence; a name of active information; a property of active information; a method of active information; and a parameter of a method of active information.
56. The machine readable medium of claim 55 wherein, in the data structure, an entity may contain elements.
57. The machine readable medium of claim 55 wherein the data structure is expressed as a list of predicates.
Type: Application
Filed: Jun 9, 2005
Publication Date: Oct 13, 2005
Applicant: Microsoft Corporation (Redmond, WA)
Inventor: Edward Jung (Bellevue, WA)
Application Number: 11/148,921