SYSTEM AND METHOD FOR IDENTIFYING COMPUTER USERS HAVING FILES WITH COMMON ATTRIBUTES
A system and a method for identifying computer users having files with common attributes are provided. The method includes generating a first table having a set of attributes for each file in a first set of files associated with a first computer user. The set of attributes for each file in the first set of files have a plurality of attribute types. The method further includes generating a second table having a set of attributes for each file in a second set of files associated with a second computer user. The set of attributes for each file in the second set of files have the plurality of attribute types. The method further includes generating a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric, and determining whether the first and second computer users have at least one file with common attributes, based on data in the similarity table.
Latest IBM Patents:
- INTERACTIVE DATASET EXPLORATION AND PREPROCESSING
- NETWORK SECURITY ASSESSMENT BASED UPON IDENTIFICATION OF AN ADVERSARY
- NON-LINEAR APPROXIMATION ROBUST TO INPUT RANGE OF HOMOMORPHIC ENCRYPTION ANALYTICS
- Back-side memory element with local memory select transistor
- Injection molded solder head with improved sealing performance
This application relates to a system and a method for identifying computer users having files with common attributes.
BACKGROUND OF INVENTIONA growing problem in the realm of information technology is managing, organizing, finding, and making use of electronic data available within a business organization. Though data may exist within the business organization, it is often difficult to locate when needed. Consequently, much effort is employed to manage and organize information so that it may be easily found and used. Although search technologies have made it easier to find electronically available information if it has already been structured and organized for public browsing on the Internet, finding information within a private computer network (intranet) remains difficult. For example, searching an intranet gives limited results in part because content creators are insufficiently or improperly motivated so make their content “interesting” (i.e., rich with links to related documents), or attractive. Consequently, few viewers are in turn motivated to link to such content.
Moreover, intranet web pages are ideally designed to provide information organized in an efficient, hierarchical structure, and do not necessarily aim to connect information to other information. Consequently, for many intranet searches only use content page contains the sought-for data, and few (sometimes zero) links point to that intranet web page from other pages. Making an intranet search even more difficult, intranet files often lack identifying characteristics to make the files stand out in a particular search.
Furthermore, some data available in an intranet is not search-engine-friendly or was never intended to be viewed directly. For example, data may be stored in locations that can't easily be crawled by a “web spider”, or data may be intended only to form a portion of a larger set of data.
Members of a business organization may wish to identify others in the organization having common interests and ideas, as suggested by their maintenance of identical or similar files. However, current search schemes generally provide results only for purposely published files.
Accordingly the inventors herein have recognized a need for an improved system and method for identifying computer users having files with common attributes.
SUMMARY OF INVENTIONA method for identifying computer users having files with common attributes in accordance with an exemplary embodiment is provided. The method includes generating a first table having a set of attributes for each file in a first set of files associated with a first computer user. The set of attributes for each file in the first set of files have a plurality of attribute types. The method further includes generating a second table having a set of attributes for each file in a second set of files associated with a second computer user. The set of attributes for each file in the second set of files have the plurality of attribute types. The method further includes generating a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric. The method further includes determining whether the first and second computer users have at least one file with common attributes, based on data in the similarity table.
A system for identifying computer users having files with common attributes in accordance with another exemplary embodiment is provided. The system includes first and second computers operably communicating with one another. The system further includes a display device operably communicating with the first computer. The first computer is configured to generate a first table having a set of attributes for each file in a first set of files associated with a first computer user. The set of attributes for each file in the first set of files have a plurality of attribute types. The second computer is further configured to generate a second table having a set of attributes for each file in a second set of files associated with a second computer user. The set of attributes for each file in the second set of files have the plurality of attribute types. The first computer is further configured to generate a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric. The first computer is further configured to determine whether the first and second computer users have at least one file with common attributes, based on data in the similarity table. The first computer is further configured to display a user identifier associated with at least one of the first and second computer users on the display device when the first and second computer users have at least one file with common attributes.
Referring to
Referring to
According to an exemplary embodiment, a table 52 is generated, including a set of attributes for each file in the set of files 50. A table 62 is also generated, including a set of attributes for each file in the set of files 60. The word “table” herein, without limiting its scope, may include a database, index, list, or other equivalent collection or collections of data.
Tables 52 and 62 include, for each file associated respectively with sets of files 50, 60, a set of attributes including username, file or directory name, and checksum. It is of course recognized that the table 52, 62 may include an alternative or different set of attributes for each file in the sets of files 50, 60. For example, and without meaning to limit the scope of attributes associated with each file, a set of attributes may include an associated user name, a file or directory name, a checksum value, a file location, a file size, a file creation date, a file modification date, a file access date, keywords from the file, and/or names of files located in the same directory.
In one exemplary embodiment, a similarity table 70 is generated utilizing table 52 and table 62 and a predetermined similarity metric. Similarity table 70 is configured to indicate an amount of similarity between files represented in table 52 and files represented in table 62. In one non-limiting example, the similarity metric utilizes a checksum value for each file represented in first table 52 with checksum values for each file represented in second table 62 to generate similarity table 70. The similarity table may then be used to indicate files represented in each table 52, 62 that have an identical checksum—without requiring the files to have any other attributes in common (i.e., multiple files need not share an identical filename or have an identical checksum). For example, similarity table 70 shows a column of unique checksums found in tables 52 and 62. An indication of computer users having a file associated with each unique checksum is also represented in the similarity table 70. Similarity table 70 may optionally indicate whether multiple files having identical checksums are similarly named, located, owned, etc.—depending on the attributes stored in tables 52 and 62. In this manner a computer user may, for example, locate copies of a particular file saved under different names, as is the case for “File X” of user 1 and “File D” of user 2. Additionally, or alternatively, the similarity table may be used indicate the names or usernames of computer users having files with attributes in common, e.g., files having identical checksums.
In an alternative embodiment, the similarity metric is a collection of checksums of portions of first and second files, and if there are a predetermined percentage of portions of the first and second files that are similar based on the associated checksums, the first and second files would be identified as being similar.
Referring to
At step 100, the computer 20 generates a first table 52 having a set of attributes for each file in a first set of files associated with a first computer user. The set of attributes for each file in the first set of files have a plurality of attribute types.
At step 102, the computer 20 generates a second table 62 having a set of attributes for each file in a second set of files associated with a second computer user. The act of attributes for each file in the second set of files have the plurality of attribute types.
At step 103, the computer 10 generates a similarity table 70 by comparing each set of attributes in the first table 52 with each set of attributes in the second table 62, utilizing a predetermined similarity metric.
At step 104, the computer 10 determines whether the first and second computer users have at least one file with common attributes, based on data in the similarity table 70.
At step 105, the computer 10 displays the names of the first and second computer users on a display device 16 when the first and second computer users have at least one file with common attributes. After step 105, the method is exited.
Referring now to
Table 138 is generated from tables 132 and 136 and is configured to include only sets of attributes from table 136 that are not present in table 132. For example, since “System files” appears in both table 132 and 136, it does not appear in the temporal differencing table 138. Also, as illustrated in
Referring to
A similarity table 150, shown in
In one non-limiting example, the similarity metric utilizes filename and file location attributes from tables 138 and 148 to determine an amount of similarity between files. For example,
Referring to
Referring now to
At step 200, the computers 10 and 20 generate the first table 132 and the second table 142, respectively, comprising a set of attributes for each of first and second sets of files 130, 140, respectively, associated with first and second users, respectively.
At step 202, after generating the first and second tables 132, 142, the computers 10 and 20 generate third and fourth tables 136, 146 comprising a set of attributes for each of a third and fourth set of files, respectively, associated with the first and second computer users, respectively. It should be noted that table 136 can be generated at a different time than generation of table 132. Further, table 146 can be generated at a time different than generation of table 142.
At step 204, computers 10 and 120 generate first and second difference tables 138, 148, respectively, associated with the first and second computers users, respectively. The difference table 138 includes a set of attributes from table 136 that are not included identically in table 132. The difference table 148 includes a set of attributes from table 146 that are not included identically in table 142.
At step 206, the computer 10 generates a similarity table 150 based on the first and second difference tables 138, 148, utilizing a similarity metric. In particular, the computer 10 compares set of attributes in table 138 with sets of attributes in table 148, utilizing a predetermined similarity metric to generate the similarity table 150.
At step 208, the computer 10 receives at least one search attribute from a search user. The search user can be either the first user or the second user. The search attribute corresponds to an attribute type contained in similarity table 150.
At step 210, the computer 10 displays one or more filename(s) associated with each set of attributes in the search-user's difference table on the display device 16 wherein the set of attributes corresponds with the search attribute.
At step 212, the computer displays one or more username(s) associated with each set of attributes in the first or second difference tables on the display device 16 wherein the set of attributes corresponds with the search attribute. After step 212, the method is exited.
It should be noted that in an alternate embodiment, an inferred relationship metric could be utilized to find files of first and second users having common attributes. An inferred relationship metric is a metric associated with an organization of files. For example, as inferred relationship metric could be a grouping of files in a folder. Further, for example, if User 1 and User 2 have “File Z” in common, the fact that User 2 also places “File C” and “File H” in the same folder as “File Z” may suggest an inferred relationship between “File C, ” “File H,” and “File Z.”
It is of course appreciated that the foregoing embodiments may be extended without limitation to generate table and results associated with sets of files associated with more than two computer users within a computer network. It should be noted that in an alternative embodiment, the foregoing tables and results can be determined utilizing a third external computer or computer server, communicating with first and second computers that store the files associated with first and second computer users, respectively.
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof. As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately. Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
The flow diagram depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
The system and methods for identifying computer users having files with common attributes provide a substantial advantage over other systems and methods. In particular, the system and methods provide a technical effect of enabling intranet users to find file resources in an intranet which are not otherwise sufficiently available, utilizing a similarity table which relates attributes of a file to the attributes of another file. Another effect of the system and the methods are that computer users are able to identify other computer users having similar files.
While the invention is described with reference to an exemplary embodiment, it will be understood by those skilled in the art that various changes may be made and equivalent elements may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to the teachings of the invention to adapt to a particular situation without departing from the scope thereof. Therefore, is intended that the invention not be limited the embodiment disclosed for carrying out this invention, but that the invention includes all embodiments falling with the scope of the intended claims. Moreover, the use of the term's first, second, etc. does not denote any order of importance, but rather the term's first, second, etc. are used to distinguish one element from another.
Claims
1. A method for identifying computer users having files with common attributes, comprising:
- generating a first table having a set of attributes for each file in a first set of files associated with a first computer user, the set of attributes for each file in the first set of files having a plurality of attribute types;
- generating a second table having a set of attributes for each file in a second set of files associated with a second computer user, the set of attributes for each file in the second set of files having the plurality of attribute types;
- generating a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric; and
- determining whether the first and second computer users have at least one file with common attributes, based on data in the similarity table.
2. The method of claim 1, wherein the first and second sets of files are stored on first and second computers, respectively.
3. The method of claim 1, wherein the set of attributes in the first table includes at least one of a user name, a filename, a file size, a file type, a file creation date, a file modification date, a file location, a checksum value associated with a file, and a collection of checksum values associated with portions of a file.
4. The method of claim 1, wherein the similarity metric is based on a quantity of checksum values in the first table that correspond to checksum values in the second table.
5. The method of claim 1, further comprising:
- generating a third table at a first time having a set of attributes for each file in a third set of files associated with the first computer user, the set of attributes for each file in the third set of files having the plurality of attribute types;
- generating a fourth table at a second time after the first time having a set of attributes for each file in a fourth set of files associated with the first computer user, the set of attributes for each file in the fourth set of files having the plurality of attribute types;
- generating the first table having only sets of attributes contained in the fourth table that are not contained in the third table;
- generating a fifth table at a third time having a set of attributes for each file in a fifth set of files associated with the second computer user, the set of attributes for each file in the fifth set of files having the plurality of attribute types;
- generating a sixth table at a fourth time after the third time having a set of attributes for each file in a sixth set of files associated with the second computer user, the set of attributes for each file in the sixth set of files having the plurality of attribute types; and
- generating the second table having only sets of attributes contained in the sixth table that are not contained in the fifth table.
6. The method of claim 1, further comprising:
- receiving a first file attribute that corresponds with a first file associated with the first computer user; and
- indicating a name of the second computer user associated with the second set of files wherein at least one file in the second set of files corresponds to the first file, utilizing the similarity table.
7. The method of claim 6, further comprising indicating one or more related files that are associated with the second computer user, wherein the related files are determined to correspond to the first file by utilizing a predetermined inferred relationship metric.
8. A system for identifying computer users having files with common attributes, comprising:
- first and second computers operably communicating with one another;
- a display device operably communicating with the first computer, the first computer configured to generate a first table having a set of attributes for each file in a first set of files associated with a first computer user, the set of attributes for each file in the first set of files having a plurality of attribute types, the second computer further configured to generate a second table having a set of attributes for each file in a second set of files associated with a second computer user, the set of attributes for each file in the second set of files having the plurality of attribute types, the first computer further configured to generate a similarity table by comparing each set of attributes in the first table with each set of attributes in the second table, utilizing a predetermined similarity metric, the first computer further configured to determine whether the first and second computer users have at least one file with common attributes, based on data in the similarity table, the first computer further configured to display a user identifier associated with at least one of the first and second computer users on the display device when the first and second computer users have at least one file with common attributes.
9. The system of claim 8, wherein the first computer is further configured to display a file name of the at least one file with common attributes on the display device.
Type: Application
Filed: Nov 21, 2006
Publication Date: May 22, 2008
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventors: Clemens Drews (San Jose, CA), Tessa Ann Lau (Mountain View, CA), James Lin (Cupertino, CA), John C. Tang (Palo Alto, CA)
Application Number: 11/562,084
International Classification: G06F 17/30 (20060101);