Method, apparatus and computer-readable medium for searching and navigating a document database
A method, apparatus, and computer readable medium for searching and navigating a document database is provided. Document categories are assigned unique numeric category identifiers. Each document in a database is assigned to one of the document categories. Metadata is associated with each electronic document that includes the numeric category identifier corresponding to the category assigned to the document. The database may be searched or browsed based on category by utilizing the metadata. URLs may also be embedded in a Web page that includes a list of document identifiers and an index. The list of document identifiers is a list containing the identities of an arbitrary number of search results. The index identifies one of the documents in the list of document identifiers to be retrieved. When such a URL is selected, a Web server computer utilizes the list of document identifiers and the index to identify the document to be returned.
Latest Microsoft Patents:
This patent application is a continuation of U.S. patent application Ser. No. 09/906,404, entitled “Method, Apparatus, And Computer-Readable Medium For Searching And Navigating A Document Database,” filed on Jul. 16, 2001 and assigned to the same assignee as this application. The aforementioned patent application is expressly incorporated herein, in its entirety, by reference.
TECHNICAL FIELDThe present invention generally relates to the field of computer databases. More specifically, the present invention relates to a method, apparatus, and computer-readable medium for searching and navigating a database containing electronic documents.
BACKGROUND OF THE INVENTIONThe World Wide Web (“Web” or “WWW”) provides access to many types of facilities for searching or browsing databases of electronic documents. Using such facilities, users can search large databases of electronic documents for individual documents matching user-provided search terms. Alternatively, a user may simply browse through electronic documents available in the database. These facilities are typically provided by a search engine application program executing on a Web server computer and provide a great deal of functionality for users performing research or otherwise trying to quickly locate a specific electronic document.
While currently available facilities for searching or browsing a document database provide a great deal of functionality, these facilities are not without their drawbacks. One of the main drawbacks of current search facilities is that these facilities do not provide functionality for searching or browsing the contents of a document database based upon a specific category of electronic document. For instance, currently available search facilities would not permit a user to specify that search results should be limited solely to a category of electronic document, such as resumes, business proposals, or financial statements. This limitation can be frustrating to a user trying to quickly locate a document belonging to a certain category of documents.
The currently available search facilities would similarly not permit a user to simply browse through all available documents belonging to a certain category, such as scripts, expense reports, or announcements. This limitation can be similarly frustrating to users wanting to browse through available documents in a certain category of documents.
Currently available facilities for searching and browsing a document database are also very computationally expensive. One reason these facilities consume such a large amount of computational resources is that a new search must be performed each time a new document is requested from a list of search results or documents available for browsing. For instance, when a user performs a search, a list of search results is returned to the Web browser. Each time a document is requested from the list of search results, a uniform resource locator (“URL”) is transmitted to the search facility that requests that another search be performed using the same parameters as the previous search. However, the request also includes a parameter instructing the search facility to return a different document from the search results than previously returned. This parameter is called a “start hit” and is provided to the search facility to identify the document to be returned from the list of search results. In this manner, a user can view each of the documents identified in a list of search results.
While the “start hit” parameter allows the search facility to remain stateless with regard to client transactions, it also causes the consumption of large amounts of computational resources because many redundant searches must be performed. If a user requests many of the documents in a list of search results, the same search may be performed many times by the search facility. If the search facility has many users performing searches, the search facility may slow down considerably. This can be frustrating to a user that has to wait while many redundant searches are performed.
Accordingly, in light of the above, there is a need for a method, apparatus, and computer-readable medium for searching a document database that can organize documents by category, and that permits searching and browsing the database based upon document category. Moreover, there is a need for a method, apparatus, and computer-readable medium for navigating between documents in a document database that does not require a search each time a document is requested.
SUMMARY OF THE INVENTIONThe present invention solves the above-described problems by providing a method, apparatus, and computer-readable medium for searching and browsing a document database. The method, apparatus, and computer-readable medium described herein is capable of organizing documents by document category and also provides a facility that allows users to search or browse the document database. The method, apparatus, and computer-readable medium provided herein also does not require a new search each time a document is selected from a list of search results or a list of documents available for browsing.
According to one actual embodiment of the present invention, a method is provided for searching a database comprising one or more electronic documents. According to this method, one or more document categories are defined and each document category is assigned a unique numeric category identifier. Each document in the database is then assigned to one of the document categories based upon the contents of the electronic document. Metadata is then associated with each of the electronic documents in the database that includes the numeric category identifier corresponding to the category assigned to the document. The metadata may be stored within the documents themselves or may be stored in a separate but associated file system.
Once metadata describing a category has been associated with each of the electronic documents in the database, the database may be searched or browsed based on category. For instance, a request may be submitted indicating that a search should be limited to documents assigned to a specified document category. In order to process such a request, the metadata associated with each of the documents in the database is searched for the numeric category identifier associated with the search request. Documents associated with metadata containing the specified numeric category identifier are then returned. A user may also browse electronic documents in the database based upon a specified category in a similar manner. The present invention also provides an apparatus and computer-readable medium for performing this method.
According to another actual embodiment of the present invention, a method is provided for navigating among documents contained in a document database. According to this method, URLs may be embedded in a Web page containing search results that include a list of document identifiers and an index into the list of document identifiers. The list of document identifiers comprises a list containing the identities of an arbitrary number of search results. The index identifies one of the documents in the list of document identifiers. When such a URL is selected, the Web server utilizes the list of document identifiers and the index to identify the document to be returned without performing a search. The identified document can then be returned to the requesting Web browser. Similar URLs may be constructed for browsing documents by requesting a document “previous” or “subsequent” to a current document. An apparatus and computer-readable medium are also provided for practicing this method.
BRIEF DESCRIPTION OF THE DRAWINGS
As described briefly above, the present invention provides a method, apparatus, and computer readable medium for searching and navigating a document database. Referring now to the figures, in which like numerals represent like elements, an illustrative embodiment of the present invention will be described. In the actual embodiment described herein, aspects of the present invention are embodied in a document gallery. In particular, aspects of the present invention are embodied in a template gallery Web site available from MICROSOFT CORPORATION, of Redmond, Wash. The template gallery Web site provides access to a document database containing document templates.
Through an interface provided by the template gallery Web site, users can search or browse document templates. A user can also preview a template to see how the selected template would appear in a word processor or other application. A user can also download a template to their local computer for editing when a desired template has been located. Additional aspects of the template gallery Web site will be described below. While aspects of the invention are described in conjunction with the template gallery Web site provided by MICROSOFT CORPORATION, it should be appreciated that the invention described and claimed herein may be utilized with any type of Web site, search engine, or other computer program that provides access to a database of electronic documents.
Referring now to
The content binding application 12 is a Web-based tool that utilizes several ASP pages to read the contents of the document database 10 and, based upon the contents, to create the search engine file system 14 and the Web server file system 20. The content binding application 12 also creates the default home page of the template gallery Web site. Additional details regarding the search engine file system 14 and the Web server file system 20 are described below with reference to
The search engine sub-system 4 builds a search catalog 18 for searching the documents contained in the document database. The search engine sub-system 4 utilizes a search engine application program 16 to create the search catalog 18 and to receive and process search requests. The search engine application program 16 utilized in the actual embodiment of the present invention described herein is the SITE SERVER 3.0 application program from MICROSOFT CORPORATION. Other search engine application programs may also be used to implement the present invention.
The template gallery Web site utilizes the Web server sub-system 6 to interface with a Web browser executing on a client computer 24. As known to those skilled in the art, the Web server application program 22 receives and responds to requests for Web pages and other types of data. In order to process these requests, the Web server application program 22 utilizes the Web server file system 20. The Web server file system 20 contains the electronic documents served by the Web server application program and other information. Additional details regarding the Web server file system 20 will be described below in greater detail with respect to
Turning now to
The computer architecture shown in
The mass storage device 38 is connected to the CPU 28 through a mass storage controller (not shown) connected to the bus 36. The mass storage device 38 and its associated computer-readable media, provide non-volatile storage for the Web server computer 26. Although the description of computer-readable media contained herein refers to a mass storage device, such as a hard disk or CD-ROM drive, it should be appreciated by those skilled in the art that computer-readable media can be any available media that can be accessed by the Web server computer 26.
By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
As described briefly above, the Web server computer 26 operates in a networked environment using logical connections to remote computers through a TCP/IP network 42, such as the Internet. The Web server computer 26 may connect to the TCP/IP network 42 through a network interface unit 44 connected to the bus 26. The Web server computer 26 may also include an input/output controller 46 for receiving and processing input from a number of devices, including a keyboard or mouse. Similarly, the input/output controller 46 may provide output to a display screen, a printer, or other type of output device.
As mentioned briefly above, a number of program modules and data files may be stored in the mass storage device 38 and RAM 32 of the Web server computer 26, including an operating system 40 suitable for controlling the operation of a networked server computer, such as the WINDOWS 2000 operating system from MICROSOFT CORPORATION. The mass storage device 38 and RAM 32 may also store one or more application programs. In particular, the mass storage device 38 and RAM 32 may store a Web server application program 22 for receiving and responding to hypertext transport protocol (“HTTP”) requests. In order to respond to such requests, the Web server application program 22 utilizes ASP pages 48. The ASP pages 48 utilized in the actual embodiment of the present invention described herein are discussed in greater detail below with reference to
The mass storage device 38 and RAM 32 of the Web server computer 26 also store a search engine application program 16. The search engine application program 16 provides search facilities to the Web server application program 22. In order to provide these facilities, the search engine application program 16 maintains a search catalog 16. The search catalog 16 is created by the search engine application 16 on a periodic basis and is utilized at run-time to quickly locate the results of search queries. The search engine application program 16 also utilizes a search engine file system 14, which is described in greater detail below with respect to
Referring now to
Turning now to
Referring now to
If a user selects one of the category links 62, such as the “Legal” category link, the user will be provided with the screen display shown in
If a user selects the “Corporate Forms” sub-category, the user will be presented with the screen display shown in
For each document template, a “Go To Preview” hyperlink 70 is also provided. The hyperlink 70 allows the user to preview the selected document template within the Web browser 64. If a user selects the hyperlink 70, the user will be presented with a visual display as shown in
If a user agrees to download and install the gallery control application, the gallery control will be received and executed. The user will then be presented with the screen display shown in
The screen display shown in
In the URL shown in Table 1, the “L=440,441,442,444,445,446 . . . ” parameter comprises the list of document identifiers and the “I=0” parameter is the index. This indicates to the Web server application program that the first document template identified in the list of document identifiers, counting from zero, should be transmitted. The “previous template” and “next template” hyperlinks are constructed in a similar manner that permits the Web server application program to locate the next or previous document without performing an additional search. The “next template” hyperlink 78 is shown in Table 2.
If the “next template” hyperlink 78 is selected, the URL is posted to the Web server application and the “next” document template is retrieved and displayed as shown in
As shown in Table 2, the hyperlink 78 also includes a category identifier for the list of document identifiers. The category identifier uses dotted notation to describe a particular category. For instance, the hyperlink 78 includes the category identifier “0.96.99”. This category identifier corresponds to the “Legal→Corporate Forms” category.
At block 1104, the search results or document templates matching the requested category are displayed to the user. At block 1104, the user may request another search. If the user requests another search, the search parameters are received and the search is conducted as described above. At block 1104, the user may also select a hyperlink to a document template. If the user makes such a selection, the Routine 1100 branches to block 1110, described below. The user may also select a category link at block 1104 to view document templates in a specific category. If the user selects a category link, the Routine 1100 branches from block 1104 to block 1108.
At block 1108, the “categorize.asp” Web page is generated by the search engine application. The “categorize.asp” Web page generates a list of document templates within a specific category. At block 1108, a user may request to search the database of document templates. If the user makes such a selection, the Routine 1100 branches to block 1106, where a determination is made as to whether the user would like to restrict the search to documents contained within a specified category of document templates. For instance, a user may request that only resumes be searched for a particular search term. If the user does not request a category-based search, the Routine 1100 branches back to block 1104, where the “result.asp” Web page is called with the query string provided by the user. If the user does request a category-based search, the Routine 1100 branches back to block 1108, where the “categorize.asp” Web page is called with the category name and the query string provided by the user. The category-based search results are then displayed to the user.
If, at block 1108, a user selects a category link, the “categorize.asp” Web page is again called with a parameter identifying the specific category selected by the user. The sub-categories or document templates contained within the category are then displayed to the user. At block 1108, a user may also select a hyperlink directed to one of the document templates. If the user selects such a template link, the Routine 1100 branches to block 1110.
At block 1110, the “preview.asp” Web page is generated by the Web server application. The “preview.asp” Web page displays a preview of the selected document template in conjunction with the gallery control application described above. In particular, the “preview.asp” Web page generates a preview using the “tpX.asp” 1112 file corresponding to the selected document template. As described above with respect to
If, at block 1110, a user selects a hyperlink for a next or previous document, the Routine 1100 continues to block 1114. At block 1114, a determination is made as to whether the next or previous document template is out of range. In particular, a determination is made as to whether the index references a document template that is out of range of the list of document identifiers. If the index is not out of range, the Routine 1100 returns to block 1110, where the hyperlink is passed to the Web server application program and a preview for the next or previous document template is generated. If the index is out of range, the Routine 1100 branches from block 1114, to block 1116 where a predetermined number of additional documents are identified by the search engine application program. The identities of the additional documents are then utilized to create a new list of document identifiers and a new index. The Routine 1100 then returns back to block 1110, where a preview for the next or previous document template is generated.
Referring now to
The Routine 1200 begins at block 1202, where unique numeric category identifiers are assigned to document categories. For instance, the number 123 may be assigned to a category containing resumes. Other categories may be similarly assigned numeric category identifiers. From block 1202, the Routine 1200 continues to block 1204, where each of the electronic documents stored in the document database is assigned to a document category. The documents are assigned to categories based upon content. So, for instance, a document comprising a resume would be assigned to a category for resumes. Document categories may also be subdivided into sub-categories. Each sub-category may also be given a unique numeric identifier.
From block 1204, the Routine 1200 continues to block 1206, where metadata is associated with each electronic document. The metadata comprises the unique numeric category identifier assigned to the document. The metadata may be stored in the electronic document itself or may be stored external to the document in another file. The metadata is then stored in the indexed catalog used by the search engine application program to search for documents. In this manner, the category metadata may be utilized to search for documents matching a certain unique category identifier. Moreover, additional metadata such as the identity of the provider or author of the document, a title of the document, or a text description of the document category to which the document is assigned may also be included. The metadata may also be utilized by the search engine application program when searching for documents. The Routine 1200 then continues to block 1208, where it ends.
Referring now to
If, at block 1306, it is determined that no additional search term was provided, the Routine 1300 continues to block 1310. At block 1310, the identities of all electronic documents in the database associated with metadata corresponding to the specified document category are returned. From blocks 1310 and 1312, the Routine 1300 continues to block 1314, where it ends.
Turning now to
From block 1406, the Routine 1400 continues to block 1408, where the index is utilized to locate the identity of the requested document in the list of document identifiers. From block 1408, the Routine 1400 continues to block 1410, where a URL for the next document in the list of document identifiers is generated. The URL is generated by incrementing the index and determining whether the index exceeds the limits of the list of document identifiers. If the index exceeds the limits of the list of document identifiers, a new list is generated.
From block 1410, the Routine 1400 continues to block 1412, where the URL for the previous document in the list of document identifiers is generated. The URL is generated in the same manner as for the next document, except that the index is decremented. From block 1412, the Routine 1400 continues to block 1414, where the requested document, the URL for the next document, and the URL for the previous document are transmitted to the client computer that requested the electronic document. The Routine 1400 then continues to block 1416, where it ends.
Based on the foregoing, it should be appreciated that the present invention provides a method, apparatus, and computer-readable medium for searching and navigating a document database. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
Claims
1. A method for searching a database comprising a plurality of electronic documents, said method comprising:
- assigning a unique numeric category identifier to each of a plurality of document categories;
- assigning each of said electronic documents to one of said document categories, said assigned category based upon the content of each of said electronic documents;
- associating metadata with each of said electronic documents, said metadata comprising said unique numeric category identifier corresponding to said document category assigned to each of said electronic documents;
- receiving a request to search said database, said search limited to documents assigned to a specified document category associated with said request;
- searching said metadata associated with each of said electronic documents; and
- returning an identity of each of said electronic documents, wherein each identity is returned as a list of document identifiers and a pointer into said list corresponding to a single one of said electronic documents.
2. The method of claim 1, further comprising:
- receiving a request to display a second document, the second document identified in said list; and
- utilizing said pointer and said list to identify said second document and to retrieve said second document from said database without performing a second search of the metadata associated with each of said electronic documents.
3. The method of claim 1, wherein said metadata for each of said electronic documents is stored in an indexed catalog.
4. The method of claim 1, wherein said request to search said database further comprises a search term and wherein returning an identity comprises returning an identity of each of said electronic documents associated with metadata having a numeric category identifier associated with said specified document category and containing said search term.
5. The method of claim 1, wherein said electronic documents comprise document templates.
6. The method of claim 5, wherein said unique numeric identifier may further comprise numeric information describing one or more sub-categories.
7. The method of claim 6, wherein said metadata further comprises an identity of an author or provider of each of said electronic documents.
8. The method of claim 7, wherein said metadata further comprises a text description of said document category to which each of said electronic documents is assigned.
9. The method of claim 8, wherein said metadata further comprises a title for each of said electronic documents.
10. A method for navigating a database comprising a plurality of electronic documents, each of said electronic documents identified by a document identifier, said method comprising:
- receiving a request for one of said electronic documents, said request comprising a list of document identifiers and a pointer into said list corresponding to said requested electronic document; and
- in response to said request, utilizing said pointer and said list of document identifiers to identify said requested electronic document from said plurality of electronic documents without performing a search of said database.
11. The method of claim 10, wherein said requested electronic document comprises one of said plurality of electronic documents identified in said list of document identifiers previous to a previously transmitted electronic document.
12. The method of claim 11, further comprising:
- decrementing said pointer;
- generating a hypertext link comprising said decremented pointer and said list of document identifiers; and
- transmitting said hypertext link.
13. The method of claim 10, wherein said requested electronic document comprises one of said plurality of electronic documents identified in said list of document identifiers and subsequent to a previously transmitted electronic document.
14. The method of claim 11, further comprising:
- incrementing said pointer;
- generating a hypertext link comprising said incremented pointer and said list of document identifiers; and
- transmitting said hypertext link.
15. The method of claim 10, wherein each of said plurality of documents in said database is named consistently with a corresponding document identifier.
16. The method of claim 15, wherein said list of document identifiers is generated in response to searching said database.
17. The method of claim 10, wherein said list of document identifiers is generated in response to a request to browse said database.
18. A computer-readable medium comprising computer-executable instructions which, when executed by a computer, cause the computer to perform a method for navigating a database comprising a plurality of electronic documents, each of said electronic documents identified by a document identifier, said method comprising:
- receiving a request for one of said electronic documents, said request comprising a list of document identifiers and a pointer into said list corresponding to said requested electronic document; and
- in response to said request, utilizing said pointer and said list of document identifiers to identify said requested electronic document from said plurality of electronic documents without performing a search of said database.
19. The computer-readable medium of claim 18, wherein said requested electronic document comprises one of said plurality of electronic documents identified in said list of document identifiers previous to a previously transmitted electronic document.
20. The computer-readable medium of claim 18, wherein said requested electronic document comprises one of said plurality of electronic documents identified in said list of document identifiers and subsequent to a previously transmitted electronic document.
Type: Application
Filed: Dec 30, 2004
Publication Date: May 26, 2005
Patent Grant number: 7660781
Applicant: Microsoft Corporation (Redmond, WA)
Inventor: Bill Chau (Kirkland, WA)
Application Number: 11/027,419