Searching for data objects
Searching for data objects includes searching a content index for first data objects, where the content index stores content of plural data objects, where the first data objects contain content that corresponds to content of a search query, and where the first data objects define a first search result set. The search for data objects also includes searching an attribute index for second data objects, where the attribute index stores access control attributes for plural data objects, where the second data objects have access control attributes that correspond to user access data associated with the search query, and where the second data objects define a second search result set. A search result is obtained based on the first search result set and the second search result set.
This patent application claims priority to German Patent Application No. DE 102004038544.0, filed on Aug. 6, 2004, the contents of which are hereby incorporated by reference into this patent application as if set forth herein in full.
TECHNICAL FIELDThis patent application relates generally to searching for data objects.
BACKGROUNDEnterprise solutions use data objects to represent real world items and other information. For example, data objects can be used to store content including, but not limited to, text, images, graphics, video, and/or multimedia. A business object is a special type of data object that can be used to represent information related to a business including, but not limited to, folders, documents, bookmarks, materials, material lists, item lists, data sheets, presentations, and text documents.
Data objects can be used to support collaborations, competitions, or official tenders. For example, a public tender can be implemented using a public folder, in which each bidding party can have a reserved area with folders. Inside these folders, bidding parties can read tender documents for particular items, store their bids, and store additional data and communications between parties. Since the data objects are stored centrally, access rights to the data objects need to be managed. For example, it should be possible to prohibit certain users from accessing all data objects. That is, two bidding parties should not be able to access each others' bidding or communication objects.
Data objects also store metadata. Metadata is data that not directly related to object content. Metadata can include object attributes, such as object identifiers, object names, time stamps, access control attributes, object locations, times of object creation, creator names, object change times, identities of users making changes, and the like. Access control attributes may specify who has access to a particular data object.
In more detail, the access control attributes can specify the identity of a user, a group of users, and/or users having a particular role, who have access to data objects. The access control attributes can also include access right information. Access right information specifies an extent of access. For example, certain users may be permitted only to read a data object, whereas other users may be permitted both to read and to edit the data object.
Search engines may be used to search databases, including databases that contain data objects. One disadvantage of conventional search engines is that they do not take user access rights into account when returning search results.
SUMMARYThis patent application features methods, systems, and apparatus, including computer program products, for use in searching for data objects. Aspects of the methods, systems, and apparatus are set forth below.
In general, in one aspect, the invention is directed to searching for data objects. This aspect includes searching a content index for first data objects, where the content index stores content of plural data objects, where the first data objects contain content that corresponds to content of a search query, and where the first data objects define a first search result set. The search for data objects also includes searching an attribute index for second data objects, where the attribute index stores access control attributes for plural data objects, where the second data objects have access control attributes that correspond to user access data associated with the search query, and where the second data objects define a second search result set. A search result is obtained based on the first search result set and the second search result set. This aspect may also include one or more of the following.
The search result may include pointers to at least some of the first and/or second data objects. The search result may be obtained by merging the first search result set and the second search result set. Merging the first search result set and the second search result set may include identifying data objects having a predetermined relationship to the first search result set and the second search result set. Merging the first search result set and the second search result set may include performing a logical AND operation on the first search result set and the second search result set to produce a resultant set.
The resultant set may be merged with an additional search result set. Merging the resultant set with the additional search result set may include performing a logical OR operation on the resultant set and the additional search result set. The additional search result set may be obtained by searching the attribute index for data objects that have access control attributes that correspond to the user access data and that have metadata that corresponds to the content associated with the search query.
The access control attributes may be metadata, and may include user access right information. The access control attributes may include at least one of a user name, a user role, a user group name, and at least one user access right. The user access right may include at least one of read access, read and write access, and full access.
The content index may be generated by indexing text of data objects. The attribute index may be generated by indexing metadata of data objects. The attribute index may be updated by indexing only changes to the access control attributes.
The details of one or more examples are set forth in the accompanying drawings and the description below. Further features, aspects, and advantages of the invention will become apparent from the description, the drawings, and the claims.
DESCRIPTION OF THE DRAWINGS
Like reference numerals in different figures denote like elements.
DETAILED DESCRIPTION
Registration page 260a can be used to register Web sites with the Web server 200. In this example, Web server 200 operates as middleware between registration page 206a and database server 201. Once a Web site is registered, its address, in this case its uniform resource locator (URL), is entered into index server 202. It is noted that index server can reject certain registrations. For example, index server 202 can include functionality to block URLs of Web sites, such as “spam sites”, from being indexed.
URLs in index server 202 are processed by robot server 203. Robot server 203 generates robot programs for searching corresponding Web sites hosted by host computers 204, 205, 206. These robot programs analyze the content and attributes of data objects in the Web sites, and provide the results of their analyses to index server 202. Index server 202 uses these results to index content and the attributes of the data objects.
To search, a user enters a search query on search page 260b. The search query is sent to index server 202, where the search query is processed. Data objects that meet one or more criteria of the search query are located, information relating to the objects is retrieved, and search results containing this information are presented to the user. Database server 201 can be used to present page 260b and search results to the user.
Index server 202 is used in connection with a search engine 212 that may be implemented using system 199 or portions thereof, as shown in
Index server 202 includes a text mining engine 202a and corresponding index 202d, a text search engine 202b and corresponding index 202e, and an attribute engine 202c and corresponding index 202f. Indices 202d and 202e can be content indexes, which enable full-text searches of content of data objects. Index 202f can be an attribute index, which indexes attributes, rather than content, of data objects. As a result, index 202f enables searches for data objects based on their attributes, rather than their content.
Attribute index 404 stores information about each document, as opposed to document content. This information can be stored as metadata and may include, e.g., attributes of the document, a name of the document, and access control attributes associated with the document. The attributes can include object identifiers (IDs), collaboration IDs, topic IDs, folder IDs, and area IDs. The access control attributes can include a user name, a user group, and/or a user role in combination with an access right for a corresponding data object. The access right can specify an ability to access the document, such whether to provide a user or group with read and/or write or full access to a particular document.
Referring back to
Search engine 212 receives (304) a search query. An example of a search query 500 is shown in
Search query 500 also includes user access data 504. User access data 504 can include data that identifies the user who generated the search query. For example, user access data 504 can identify the “current user”, a “current group” that the user is in, and/or a “current role” of the user in that group. In this regard, the user can participate in several groups and can have several roles, each of which can be reflected in search query 500.
After receiving the search query, process 299 searches (306) content index 406 for data objects that correspond to parameters of search statement 502. Searching for data objects within content index 406 includes searching for data objects with elements of content that at least partially meet (e.g., match, or are similar to) parameters of the search statement. All data objects that are obtained (310) by searching content index 406 constitute a first search result set. The first search result set can include the data objects themselves, IDs or other indicia associated with the data objects, or links or pointers to the data objects. The links or pointers can be references to storage locations of the data objects. Fuzzy logic may be used to perform the search. Such a search may identify data objects with elements of data that are similar, in some way, to parameters of the search query (as opposed to matching the parameters of the search query exactly).
In addition to searching content index 406, process 299 searches (308) attribute index 404 for data objects that correspond to user access data 504. That is, process 299 searches for data objects with access control attributes that correspond to access control attributes for the user, which correspond to user access data 504.
Search engine 212 can search for data objects with access control attributes containing the user's name, the user's groups, and/or the user's roles. Process 299 can also determine whether access right information for the user's name, group, and/or role specifies at least read access, meaning that the user is permitted to read information, but not to write the information. Other levels of access include read and write access, in which a user is allowed both to read and to write to the information, and full access, in which the user is permitted virtually unfettered access to the information. It is possible to require that all user names, groups and/or roles stored in the access control attributes be at least read access before allowing the user access to the corresponding data objects.
Search engine 212 can also search in attribute index 404 for areas, folders, collaborations, and/or other hierarchical objects, which correspond to the user name, the user groups, and/or the user roles, and which grant at least read access to the user.
All data objects that are obtained (312) as a result of searches performed on attribute index 404 constitute a second search result set. This second search result set can identify, e.g., all data objects for which the user has at least read access. The second search result sets can include the data objects themselves, IDs or other indicia associated with the data objects, or links or pointers to the data objects. The links or pointers can be references to storage locations of the data objects. Fuzzy logic may be used to perform the search.
Process 299 merges (314) the first search result set and the second search result set to thereby obtain an overall search result set. This overall search result set is presented to the user as the end-result of the user's query using the search engine. Merging the first and second search result sets can include combining the first search result set with the second search result set. Other methods of merging search results sets may also be used.
By way example, a user may want to search for data objects that were created at a particular time. This information can be entered into search engine 212 as a search query. The timing information is stored in the metadata of data objects. Thus, in some cases, it is useful to search the metadata of data objects using the search statement. Results of such a search of a content index 903 can constitute an additional search result set.
In this example, determination of search result set 910 can include using logical OR 908 to merge the additional search result set with the previous search result set produced by logical AND 906. In this example, data objects in search result set 910 will either have elements of data that meet the search query or have elements of data in their metadata that meet the search query. The resulting data objects also permit at least read access to the user.
As noted above,
According to process 701, an index may be generated from a predetermined set data objects. To do this, an administration module 702 sends a request, FullIndexing, for a full indexing 710, meaning all objects are to be indexed. A dispatching module 704 processes this request to obtain a message, GetObjectIdList, that is used to retrieve IDs of data objects to be indexed. In this case, all available data objects are to be indexed; however, in other cases, a subset of data objects may be targeted for indexing.
The foregoing GetObjectIdList message is processed by a server 706, which stores the data objects, e.g., business objects for a collaboration folder (cFolders) structure. Server 706 instantiates the data objects to obtain the data objects' IDs. The data objects' metadata is also obtained from the instantiated objects, and stored by server 706.
Server 706 returns, to dispatching module 704, a table, TabObjectsIds, which lists the data objects' IDs. Module 704 uses this table to request, via a GetAttributesNameList message, an attribute name list for the data objects from the server 706. In response, server 706 returns, to dispatching module 704, a table, TabAttributeNames. This table contains names of attributes of the data objects listed in the table.
The attributes' names are used to request attribute values for the named attributes. Module 704 sends a message, GetAttributeValues, to server 706 and, in response, receives a table, TabAttributeValues, of attribute values for the named attributes. The data object IDs, together with their attributes and attribute values, are provided to an index server 708 of a search engine via a FeedIndex message. Index server 708 generates, and stores, the index for the named data objects. Index server 708 reports successful generation of an index to administration module 702 and status information to module 704. It is noted that index server 708 corresponds to index server 202 of
The attribute information may include a hierarchical folder structure for data objects. This information enables searching to be limited to a particular area, folder or other hierarchy, within a data structure. Generating the attribute index can also include indexing root information and area information for the data objects. This information can be used to determine a position of a data object within a database.
Foregoing process 701 may be used to index metadata and/or the content (i.e., full-text) of data objects. Because indexing data objects can be time consuming, the data objects can be indexed periodically. Periodic indexing can include indexing only changed values.
Referring to process 801 of
In response to GetAttributeNameList, server 706 returns a list, TabAttributesNames, that contains data objects that have changed. The list also identifies attributes of those data objects, and which attributes have changed. Module 704 requests the values of the changed attributes in a message, GetAttributesValues, which identifies data objects for which values are requested. Server 706 responds to GetAttributesValues with a table of attribute values, TabAttributesValues, for data objects identified in GetAttributesValues. To obtain values of the attributes, instances of the data objects may be created and metadata therefor may be read.
The attribute values contain access control attributes, which can be used in search engine 212 to support searches using access control. The obtained attribute values are provided by module 704 to index server 708. Index server 708 updates its index using the values. Index server 708 confirms an index update with appropriate response messages.
Referring now to
In a field 1002a, a user can specify additional attributes, such as, a name, a description, a date of creation, a name of creator, change data, a name of a user who changed data, or other metadata. The user can also specify, in field 1002b, a search query, which, in this example is “sap”. A search button 1004 is provided. Upon selecting search button 1004, search engine 212 is activated to obtain a search result set, which is provided in the page shown in
When selecting search button 1004, a search query, including any specified attributes, is provided to search engine 212. In addition, the user who initiated the search request is identified. This can be done by instructing the user to register with system 199 (
The search query and associated attributes are processed by search engine 212 in the manner described above in
As shown in
Computer 900 can communicate with computers 901 and 902 over network 990. Computer 900 has processor 910, memory 920, bus 930, and, optionally, input device 940 and output device 950 (I/O devices, user interface 960). The processes can be implemented by computer program product 100 (CPP), carrier 970 and/or signal 980.
With respect to computer 900, computer 901, 902 is sometimes referred to as “remote computer”, computer 901, 902 may be, for example, a server, a peer device or other common network node, and typically has many or all of the elements described relative to computer 900. Computer 900 is may be a conventional personal computer (PC), a desktop device or a hand-held device, a multiprocessor computer, a pen computer, a microprocessor-based or programmable consumer electronics device, a minicomputer, a mainframe computer, a personal mobile computing device, a mobile phone, a portable or stationary personal computer, a palmtop computer, or the like.
Processor 910 may be a CPU, a micro-controller unit (MCU), digital signal processor (DSP), or the like. Memory 920 includes elements that temporarily or permanently store data and instructions. Although memory 920 is illustrated as part of computer 900, memory can also be implemented in network 990, in computers 901, 902 and in processor 910 itself (e.g., cache, register), or elsewhere. Memory 920 can be a read only memory (ROM), a random access memory (RAM), or a memory with other access options. Memory 920 may be implemented by machine-readable media, for example: (a) magnetic media, like a hard disk, a floppy disk, or other magnetic disk, a tape, a cassette tape; (b) optical media, like optical disk (CD-ROM, digital versatile disk—DVD); (c) semiconductor media, like DRAM, SRAM, EPROM, EEPROM, and memory stick.
Memory 920 may be distributed. Portions of memory 920 can be removable or non-removable. For reading from media and for writing in media, computer 900 uses well-known devices, for example, disk drives, or tape drives.
Memory 920 stores modules such as, for example, a basic input output system (BIOS), an operating system (OS), a program library, a compiler, an interpreter, and a text-processing tool. Modules are commercially available and can be installed on computer 900. For simplicity, these modules are not illustrated.
CPP 100 includes program instructions and—optionally—data that causes processor 910 to execute method steps of the processes described herein. In other words, CPP 100 can control the operation of computer 900 and its interaction in network system 999 so that is operates to perform in accordance with the processes described herein. For example, CPP 100 can be available as source code in any programming language, and as object code (“binary code”) in a compiled form.
Although CPP 100 is illustrated as being stored in memory 920, CPP 100 can be located elsewhere. CPP 100 can also be embodied in carrier 970. Carrier 970 is illustrated outside computer 900. For communicating between CPP 100 and computer 900, carrier 970 is inserted into input device 940. Carrier 970 may be implemented as any machine-readable medium, such as a medium largely explained above (memory 920). Generally, carrier 970 is an article of manufacture having a machine-readable medium with machine-readable program code to cause the computer to perform methods described herein. Further, signal 980 can also embody computer program product 100.
The processes described herein are not limited to use with any particular hardware and software; they may find applicability in any computing or processing environment and with any type of machine that is capable of running machine-readable instructions. All or part of the processes can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof.
All or part of the processes can be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps associated with the processes can be performed by one or more programmable processors executing one or more computer programs to perform the functions of the processes. The method steps can also be performed by, and the processes can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) and/or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only storage area or a random access storage area or both. Elements of a computer include a processor for executing instructions and one or more storage area devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from, or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile storage area, including by way of example, semiconductor storage area devices, e.g., EPROM, EEPROM, and flash storage area devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
All or part of the processes can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a LAN and a WAN, e.g., the Internet.
Method steps associated with the processes can be rearranged and/or one or more such steps can be omitted to achieve the same, or similar, results to those described herein.
The data objects described herein may be used to store content, which can include, but is not limited to, text, images, graphics, video or other multimedia content.
Elements of different embodiments described herein may be combined to form other embodiments not specifically set forth above. Other embodiments not specifically described herein are also within the scope of the following claims.
Claims
1. A method of searching for data objects, the method comprising:
- searching a content index for first data objects, the content index storing content of plural data objects, the first data objects containing content that corresponds to content of a search query, the first data objects defining a first search result set;
- searching an attribute index for second data objects, the attribute index storing access control attributes for plural data objects, the second data objects having access control attributes that correspond to user access data associated with the search query, the second data objects defining a second search result set; and
- obtaining a search result based on the first search result set and the second search result set.
2. The method of claim 1, wherein the search result comprises pointers to at least some of the first and/or second data objects.
3. The method of claim 1, wherein obtaining comprises merging the first search result set and the second search result set.
4. The method of claim 3, wherein merging the first search result set and the second search result set comprises identifying data objects having a predetermined relationship to the first search result set and the second search result set.
5. The method of claim 3, wherein merging the first search result set and the second search result set comprises performing a logical AND operation on the first search result set and the second search result set to produce a resultant set.
6. The method of claim 4, further comprising:
- merging the resultant set with an additional search result set.
7. The method of claim 6, wherein merging the resultant set with the additional search result set comprising performing a logical OR operation on the resultant set and the additional search result set.
8. The method of claim 6, further comprising:
- obtaining the additional search result set by searching the attribute index for data objects that have access control attributes that correspond to the user access data and that have metadata that corresponds to the content associated with the search query.
9. The method of claim 1, wherein the access control attributes comprise metadata.
10. The method of claim 1, wherein the access control attributes comprise user access right information.
11. The method of claim 1, wherein the access control attributes comprise at least one of a user name, a user role, a user group name, and at least one user access right.
12. The method of claim 10, wherein the user access right information comprises at least one of read access, read and write access, and full access.
13. The method of claim 1, further comprising:
- generating the content index by indexing text of data objects.
14. The method of claim 1, further comprising:
- generating the attribute index by indexing metadata of data objects.
15. The method of claim 14, further comprising:
- updating the attribute index by indexing only changes to the access control attributes.
16. A computer program product tangibly embodied in an information carrier, the computer program product comprising instructions that, when executed, cause at least one processor to perform operations comprising:
- searching a content index for first data objects, the content index storing content of plural data objects, the first data objects containing content that corresponds to content of a search query, the first data objects defining a first search result set;
- searching an attribute index for second data objects, the attribute index storing access control attributes for plural data objects, the second data objects having access control attributes that correspond to user access data associated with the search query, the second data objects defining a second search result set; and
- obtaining a search result based on the first search result set and the second search result set.
17. The computer program product of claim 16, wherein obtaining comprises merging the first search result set and the second search result set.
18. The computer program product of claim 16, wherein merging the first search result set and the second search result set comprises identifying data objects having a predetermined relationship to the first search result set and the second search result set.
19. The computer program product of claim 16, wherein merging the first search result set and the second search result set comprises performing a logical AND operation on the first search result set and the second search result set to produce a resultant set.
20. A system for searching for data objects, the system comprising:
- memory that stores executable instructions; and
- a processing device that executes the executable instructions to: search a content index for first data objects, the content index storing content of plural data objects, the first data objects containing content that corresponds to content of a search query, the first data objects defining a first search result set; search an attribute index for second data objects, the attribute index storing access control attributes for plural data objects, the second data objects having access control attributes that correspond to user access data associated with the search query, the second data objects defining a second search result set; and obtain a search result based on the first search result set and the second search result set.
Type: Application
Filed: Apr 21, 2005
Publication Date: Feb 9, 2006
Inventors: Christian Deubel (Ludwigshafen), Gertrude Guth (Heidelberg)
Application Number: 11/111,508
International Classification: G06F 17/30 (20060101);