Method And System For Defining Relationships Among Labels
In a content system where labels are used to organize content, relationships between labels may be defined. A relationship may be unidirectional or bidirectional. A label may have multiple relationships to or from other labels. When the user selects a first label, information corresponding to a second label may be displayed in accordance with the relationship between the first and second labels. Relationships between labels may also be inferred by examining the labels and the content associated with the labels.
Latest Google Patents:
The present application claims priority to and is a continuation of U.S. patent application Ser. No. 11/731,686, filed Mar. 30, 2007, now U.S. Pat. No. 8,533,232, which is incorporated herein by reference in their entirety.
TECHNICAL FIELDThe disclosed embodiments relate generally to content categorization, and more particularly, to methods and systems for defining relationships among labels or tags that may be associated with content.
BACKGROUNDThe Internet has become a powerful medium for storage and sharing of content. Many web-based services, such as photo-sharing sites, blogs, and social bookmarking sites, are available for users to store content and to share content with other users. The growth of these services have also led to the growth of “folksonomy,” in which users categorize content by assigning freely chosen keywords, tags, or labels to the content.
Folksonomy has some advantages, such as user freedom and its distributed nature. However, folksonomy also has some disadvantages. Because of the freedom of users to make up their own tags, there can be problems with users making up different tags for the same meaning and tags that may have multiple meanings. Furthermore, folksonomies tend to be unstructured. These disadvantages hinder efficient indexing and searching of tagged content by search engines.
Accordingly, there is a need for a more efficient manner of managing content tags.
SUMMARYAccording to some embodiments, a method of labeling data items includes identifying a first label and a second label, the labels being distinct from a logical storage scheme associated with the data items; receiving a specification of a relationship between the first label and the second label; associating the first label with the second label in accordance with the relationship; applying the first label to the data items; and in response to a selection of the second label, presenting information associated with the data items based on the relationship.
According to some embodiments, a method of associating labels includes identifying a first label and a second label that are associated with respective data items; examining the first and second labels and the respective data items; inferring a relationship between the first label and the second label based on the examination; and associating the first label with the second label in accordance with the relationship.
According to some embodiments, the aforementioned methods may be performed by a system having memory and one or more processors.
According to some embodiments, instructions for performing the aforementioned methods may be included in a computer program product.
Like reference numerals refer to corresponding parts throughout the drawings.
DESCRIPTION OF EMBODIMENTSA user can tag or label content with tags or labels (both “tags” and “labels” are used interchangeably throughout this description) and specify relationships between individual content items by defining relationships between the tags or labels. The relationships may be selected from a pre-specified set. Arbitrary relationships may also be specified. Additionally, relationships between labels may be inferred by examining the labels and the content associated with the labels.
The clients 102 are devices from which a user 103 may access content. The client may be any device capable of communicating with other computers, devices, and so forth through the network 106. Examples of client devices may include, without limitation, desktop computers, notebook (or laptop) computers, personal digital assistants (PDAs), mobile phones, network terminals, and so forth. In some embodiments, the client device includes one or more applications for communicating with other computers or devices through the network 106. Examples of such applications include, without limitation, web browsers, email applications, and instant messaging or chat applications. The client device may also include utility applications, such as calendar/scheduling, contact management, and or task management applications.
The content system 104 stores content or data items and provides same to clients 102. The content or data items may include documents such as web pages, electronic messages, images, other digital media content such as audio and video files, links to such, etc. In some embodiments, the content system 104 may include one or more content servers.
The content system 104 allows a user to organize content by labeling or tagging the content. A user may assign one or more labels or tags to his content or data items. A label may be completely arbitrary, or may be chosen to provide a hint of the subject matter of the content. A label may be assigned to multiple data items, and a data item may have multiple labels assigned to it.
-
- an operating system 210 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 212 that is used for connecting the content server 200 to other computers via the one or more communication network interfaces 204 (wired or wireless), such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- user data 214 for storing per-user data;
- a labeling module 222 for labeling content and setting relationships between labels; and
- a relationship discovery module 224 for discovering and suggesting possible relationships between labels.
The user data 214 stores data and content associated with user accounts 215, or with other digital data or content designated by a user (e.g., images from the World Wide Web or other network 106). The data or content stored under a user account 215 may include the following, or a subset thereof:
-
- content 216, which may include content uploaded to the content server 200 by the user (or someone else) and documents for which the user has created links, pointers, or bookmarks;
- labels or tags 218, for labeling or tagging the content 216;
- label relationships 220, for specifying relationships between labels; and
- a label-content mapping or table 221 for mapping associations between labels and content or data items.
Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 206 may store a subset of the modules and data structures identified above. Furthermore, memory 206 may store additional modules and data structures not described above.
Although
-
- an operating system 310 that includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 312 that is used for connecting the client 102 to other computers via the one or more communication network interfaces 304 (wired or wireless), such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on; and
- a client application 314.
The client application enables users of the client 102 to access the content system 104 and hosts 108 (
Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 306 may store a subset of the modules and data structures identified above. Furthermore, memory 306 may store additional modules and data structures not described above.
The graph 400 includes four nodes representing labels: A 402, B 404, C 406, and D 408. The nodes are connected to each other by various directed edges. The edges specify the relationships between the labels. For example, nodes A 402 and B 404 are connected by a bidirectional “related_to” edge and a unidirectional “child_of” edge. The “related_to” edge specifies that labels A and B are related to each other. The “child_of” relationship edge specifies that label B is a child of label A. That is, label B is a sub-label of label A, similar to the relationship between a folder and sub-folders within the folder.
Label nodes A 402 and C 406 are connected by two bidirectional edges: “synonym_of” and “related_to.” These edges specify that labels A and C are synonyms of each other and are related to each other. Label nodes A 402 and D 408 are connected by a unidirectional “more_important_than” edge, which specifies that label A (and content associated with label A) is more important than label D (and content associated with label D).
More generally, between any two nodes representing labels, there may be any number of edges, unidirectional or bidirectional, representing relationships between the labels. An example is shown with regard to label nodes C 406 and D 408. Nodes C and D are connected by a bidirectional edge “relationship_x” and two unidirectional edges “relationship_y” and “relationship_z,” each going in opposite directions. It is possible for two nodes to have a unidirectional relationship in one direction and another, unrelated unidirectional relationship in the opposite direction. A relationship edge is represented by a bidirectional edge if the relationship is mutual. For example, synonym and related-to relationships are represented by bidirectional edges because both of these relationships are mutual; “A is a synonym of B” implies a mutual relationship “B is a synonym of A,” and “A is related to B” implies a mutual relationship “B is related to A.” A relationship is represented by a unidirectional edge if the relationship is not mutual. For example, child-of and more-important-than relationships are represented by unidirectional edges because both of these relationships are not mutual. Indeed, a unidirectional relationship often implies an opposite relationship in the other direction. For example, “A is a child of B” implies the opposite relationship “B is a parent of A,” and “A is more important than B” implies the opposite relationship “B is less important than A.”
A user accesses his account in the content system 104 and provides a first label and a second label (hereinafter “M” and “N,” respectively, for convenience) (502). The user may create the label(s), if they have not been created already. In some embodiments, a label is simply a string of characters. In some other embodiments, a label may include a character string and/or an image such as an icon. If the desired label has already been created, the user can also select the label from a list of existing labels. In some embodiments, the content system 104 may also provide one or more predefined labels for use by the user.
The user specifies a relationship between labels M and N (504). The relationship is identified by a character string and/or an image, such as an icon. The user may select a relationship from a list of predefined relationships provided by the content system 104. These predefined relationships include ones that are considered to be useful to users in their content organization and management tasks. These predefined relationships have semantic meanings that are known to the content system 104 and that should be apparent to the user from the character string identifying the relationship. In some embodiments, the predefined relationships include:
-
- child-of: one label and contents associated with the label are subordinate to another label within an hierarchy; similar to the relationship between a sub-folder and a folder;
- synonym-of: two labels are synonyms of each other or are equivalents of each other;
- related-to: two labels are not necessarily equivalents but are related nonetheless;
- member-of: one label is a member of a set identified by another label;
- more-important-than: content associated with one label has higher priority than content associated with another label;
- prerequisite-of: content (e.g., a task in a task list) associated with one label is necessary to operation of or on content associated with another label; and
- current-version-of: content associated with a first label are the most recent or newest of a class of content that associated both the first label and the second label.
It should be appreciated that the predefined relationships described above are merely exemplary. The content system 104 may provide other predefined relationships in addition to or in lieu of those described above.
In some embodiments, the user may also create an entirely arbitrary relationship by entering a character string identifying the relationship. The semantic meaning of such an arbitrary relationship is known only to the user-creator of the relationship, unlike the predefined relationships, whose semantic meanings are known to the content system 104.
The labels M and N are associated with each other in accordance with the specified relationship (506). The labels are applied to respective content or data items (508). That is, the content or data items are tagged with the labels and are associated with the labels in the content system 104. The content associated with the labels, including content that was associated with the labels before the creation of the relationship, are associated with each other in accordance with the relationship. The associations between labels and data items may be stored as a table of label-data item associations or some other sort of mapping from labels to data items or vice versa.
The user may later select one of the labels, say label M, in order to view information associated with that label (510). In some embodiments, the user can select the label by clicking on the label in the user interface. Information corresponding to content or data items associated with labels having a relationship with label M may be displayed to the user, in accordance with the relationship between labels M and N (512). For example, if label M is related to label N, then if label M is selected, information corresponding to content associated with label N may be displayed as related to label M. As another example, if content associated with labels M and N are tasks in a task list and label N is “a prerequisite of” label M, then when label M is selected, tasks associated with label N may be shown as prerequisites to the completion of tasks associated with label M.
A set of labels and content associated with the labels are identified (602). The labels and the content are examined (604). In some embodiments, the examination includes examining the labels for similarity, common substrings, etc. and examining active associations and relationships between labels and content. In some other embodiments, the examination goes further and actually examines the content themselves.
In some embodiments, the examination includes applying pre-specified rules to the labels and content. The rules specify the circumstances under which a relationship between two labels may be inferred. For example, a rule may specify that if the content associated with a first label is a proper subset of content associated with a second label, then possible label relationships that may be inferred include, among others, a hierarchal or a “related to” relationship. In some embodiments, the relationship between respective labels may be discovered using a program that automatically evaluates the relatedness/similarity between the words and phrases that compose the labels.
For one or more pairings amongst the set of labels, relationships are inferred based on the examination (606). The inferred relationships are suggested to the user for creation (608). If the user accepts a suggestion (610—yes), then the corresponding relationship is created and the labels in the inferred relationship are associated with each other in accordance with the inferred relationship (612). If the suggestion is not accepted (610—no), then the suggested relationship is rejected (614).
In a large body of collaboratively-tagged data, there will be a lot of redundancy and discrepancy between tags. For example,
Conversely, if things tagged with the less-popular labels “SF” or “Frisco” are also tagged “San Francisco”, then that implies an 80% relationship in the other direction—a user looking at items tagged “SF” is 80% likely to also be interested in items tagged “San Franscisco.” In these embodiments the set of things tagged with the more popular term mostly contains the set of things tagged with the less-popular term, so, in the present example, it can be assumed that the labels “SF” and “Frisco” are very likely to related to the same thing as the label “San Francisco”.
In other words, some embodiments can use the redundancy and discrepancies amongst the terms or labels used by various users to tag information to suggest relationships between those terms or labels.
In the embodiments described above the relatedness between labels is derived from data entered by multiple users who are tagging/labeling the same set of data. An example of an application where this might occur is “image search,” where everyone is looking at the same pictures. Implied relationships derived from tags or labels can also be applied to situations where only one person does the tagging—such as in relation to a personal photo collection. This is because in a variety of embodiments the implied relationships derived from the tags can be applied to any set of tagged data based on knowledge of relationships between the tags, or labels.
To create a relationship, the user types in or selects a first label in the label menu 804, a second label in label menu 808, and types in or selects a relationship in the relationships menu 806. The user may then click a submit button 818 to create the relationship or click a cancel button 820 to cancel. If the selected relationship is a unidirectional relationship, then the first label may be treated as the “tail” and the second label as the “head” of the relationship.
The interface may also show a table 802 of active label relationships. The table 802 includes a tail label column 810, a relationships column 812, and a head label column 814, similar to the table data structure 700. The table 802 may also include checkboxes 816 where the user can indicate relationships to be removed (deleted) upon clicking of the submit button 818.
The interface may also include a tool for deleting labels (not shown). When a label is deleted, all relationships involving that label are deleted as well. Content associated with the deleted label remains but loses the deleted label.
Attention is now directed to applications of labels associated with each other in accordance with the embodiments described above.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Claims
1. A user interface method, comprising:
- displaying on an electronic display a plurality of labels that can be applied to data items stored on a computer system, such labels being distinct from a logical storage scheme associated with the data items;
- displaying on the electronic display a view of at least a subset of the data items;
- enabling the user to associate with particular ones of the data items selected from the view at least one of the labels;
- displaying on the electronic display a plurality of relationships that can exist between the labels;
- enabling the user to define for one or more pairs of the labels at least one of the relationships;
- enabling the user to select a specific label and, in response to such selection, displaying a view of at least a subset of the data items, wherein the subset of the data items is selected from: at least one of the data items associated with the specific label; and at least one of the data items associated with one or more other labels with a defined relationship to the specific label.
2. The method of claim 1, wherein the defined relationship is an arbitrary relationship between the specific label and the other label.
3. The method of claim 1, wherein the defined relationship comprises one of the group consisting of: a child-of relationship, a synonym-of relationship, a more-important-than relationship, a prerequisite-of relationship, a member-of-relationship, a related-to relationship, and a current-version-of relationship.
4. The method of claim 1, wherein the user is a particular user in a plurality of users, further comprising:
- enabling a first group of at least two of the plurality of users to associate labels with at least a subset of the data items; and
- enabling a second group of at least two of the plurality of users to define relationships among at least a subset of the labels;
- wherein the subset of the data items displayed in response to selection by the particular user of one of the labels is based on the relationships defined for the one label by the second group.
5. The method of claim 4, further comprising: discovering relationships among a second subset of the labels in view of instances in which different ones of the second subset of the labels are assigned to a particular data item.
Type: Application
Filed: Sep 10, 2013
Publication Date: Apr 17, 2014
Applicant: Google Inc. (Mountain View, CA)
Inventors: Jed E. Hartman (Mountain View, CA), Clive Saha (San Francisco, CA), Astrid Atkinson (Boulder Creek, CA)
Application Number: 14/023,425
International Classification: G06F 17/30 (20060101); G06F 3/0482 (20060101);