DATABASE INSERTION AND RETRIEVAL SYSTEM AND METHOD

Info

Publication number: 20080052281
Type: Application
Filed: Aug 23, 2006
Publication Date: Feb 28, 2008
Applicant:
Inventors: Matthew R. Liberty (Apalachin, NY), Bruce R. Wilde (Vestal, NY)
Application Number: 11/466,478

Abstract

A database processing system and method for inserting into a database and retrieving from database documents formatted in accordance with a markup language.

Description

Description

The present invention relates generally to data processing, and, more particularly, to database processing for information provided in markup language form.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system block diagram of a database insertion and retrieval system according to various embodiments;

FIG. 2 is a functional block diagram of the information storage and retrieval application according to various embodiments;

FIG. 3 is a database insertion functional flow diagram according to various embodiments;

FIG. 4 is an illustration of the mapping of attributes contained in input information to identifiers or keys in a hash table index of output information according to various embodiments;

FIG. 5 is a database retrieval functional flow diagram according to various embodiments;

FIGS. 6A and 6B are a flow chart of a database insertion method according to various embodiments;

FIGS. 7A and 7B are a flow chart of a database retrieval method according to various embodiments;

FIG. 8 is an example stylesheet defining attributes to be used for keys according to various embodiments;

FIGS. 9A and 9B are an example stylesheet used to obtain requested XML according to various embodiments;

FIGS. 10A and 10B are an example stylesheet used to obtain requested XML for a particular locale according to various embodiments; and

FIGS. 11A and 11B are an example stylesheet used to obtain requested XML for a particular version according to various embodiments.

DETAILED DESCRIPTION

Embodiments are directed generally to a system and method for inserting document text into a database and for retrieving portions of the document text from that database. In particular, various embodiments can comprise a system and methods for generating one or more keys from selected attributes occurring in input information, and to insert output information comprising the keys into a database.

With respect to FIG. 1, there is shown a database insertion and retrieval system 100 according to various embodiments. As shown in FIG. 1, the database insertion and retrieval system 100 can comprise a server 101 provided in communication with one or more client devices 102 using a network 103. In various embodiments, the server 101 can comprise an information storage and retrieval application 105 provided in communication with a database 107. The server 101 can further comprise a communication interface configured to accomplish packet-based communication using the network 103.

In various embodiments, the information storage and retrieval application 105 can comprise one or more servlets that includes a sequence of programmed instructions that, when executed by a processor of the server 101, cause the server 101 to be configured to perform database insertion and retrieval functions as described herein.

The database 107 can comprise a memory manager 109 and a storage device 111 provided in communication with the memory manager 109. In various embodiments, the database 107 can store and retrieve information or data in response to one or more (Structured Query Language) SQL instructions. The storage device 111 can comprise a hard disk drive configured to store information in accordance with SQL. Further, the memory manager 109 can comprise a database manager that includes a local memory 112. In various embodiments, the memory manager 109 local memory 112 can comprise a hash table index 113 and recently accessed database information from the storage device 111. In various embodiments, the local memory 112 of the memory manager 109 can have a faster access time latency than the storage device 111. For example, the local memory 112 can comprise a Random Access Memory (RAM) and the storage device 111 can comprise a hard disk drive, in which case the local memory 112 can have an access time latency on the order of ten times faster than the storage device 111. In various embodiments, the local memory 112 can comprise a fixed memory size specified by a target threshold size parameter. The memory manager 109 can be configured to remove the oldest information in local memory 112 to provide capacity to store the transformed information and maintain the size of the local memory 112 below the target threshold size. The target threshold size and the frequency of checking whether or not the target threshold size has been exceeded can each be configurable parameters controlled by the user.

The client device 102 can comprise a Personal Computer (PC) or workstation including, but not limited to, a desktop PC, laptop PC, tablet PC, Personal Digital Assistant (PDA), cellular terminal or handset, wireless terminal or handset, Internet appliance, or any other such device. In various embodiments, the client device 102 can comprise a communication interface configured to accomplish packet-based communication using the network 103. For example, the client device 102 can include a browser application such as Microsoft® Internet Explorer™ available from Microsoft Corporation of Redmond, Wash., or Mozilla Firefox™ available from the Mozilla Foundation of Mountain View, Calif. In various embodiments, the client device 102 can communicate with the server 101 using the network 103 in accordance with the HyperText Transfer Protocol (HTTP). For example, a user can establish a session with the server 101 by entering the Uniform Resource Locator (URL) associated with the server 101 into an address field of the browser application. In various embodiments, the client device 102 can also comprise a standard set of hardware and software such as, but not limited to, a processor, Read Only Memory (ROM), Random Access Memory (RAM), communication ports, user interface, operating system, application programs, as well as standard peripherals such as, but not limited to, a data entry device such as a keyboard, a pointing and selection device such as a mouse or trackball, and a display. The operating system can be configured to support application programs configured to accept user input via the user interface in the form of interactive pages comprising static and dynamic display data and data entry fields.

In various embodiments, the network 103 can comprise a packet-based network configured to transfer packet-based information. For example, the network 103 can comprise an Internet Protocol (IP) based network in which information is transferred in accordance with the Transmission Control Protocol (TCP)/IP standard such as, for example, the Internet. In various embodiments, the network 103 can comprise an intranet, a wireless communication network such as Global System for Mobile Communications (GSM) or Code Division Multiple Access (CDMA), a satellite communication network, or a Local Area Network (LAN) or Wireless LAN based on, for example, the IEEE 802.11 standard. Other variations are possible. For example, the network 103 can also comprise a connection-based network such as, for example, the Public Switched Telephone Network (PSTN).

With respect to FIG. 2, there is shown a functional block diagram of the information storage and retrieval application 105 according to various embodiments. As shown in FIG. 2, in various embodiments, the information storage and retrieval application 105 can comprise an input/output portion 150, a translator portion 160, and a database interface portion 170. In various embodiments, input/output portion 150, a translator portion 160, and a database interface portion 170 can comprise one code object. In various alternative embodiments, each of the portions 150, 160 and 170 can comprise multiple objects provided in communication using, for example, interprocess communication techniques.

In various embodiments, the input/output portion 150 can comprise a sequence of Java™ instructions that configure the information storage and retrieval application 105 to input and output information in accordance with the HyperText Transfer Protocol (HTTP). Other embodiments are possible. For example, in various alternative embodiments, the information storage and retrieval application 105 can comprise one or more Common Gateway Interface (CGI) scripts.

Further, in various embodiments, the translator portion 160 can comprise a markup language translator configured to read input information and translate the input information into output information in accordance with translation instructions. In various embodiments, the input information and output information can be a text stream formatted in accordance with the Extensible Markup Language (XML) markup language. Further, the markup language translator can be configured to perform Extensible Style Language Transformation (XSLT) in accordance with translation instructions specified by one or more Extensible Style Language (XSL) stylesheets 165. The translator portion 160 can accept the input information as an input file or as a document contained in an input file. The translator portion 160 can provide the output information as an output file. The translator portion 160 can thus operate as an XSLT parser configured to translate a first XML document into a second XML document, for example. In various embodiments, the stylesheets 165 can be instantiated at time of application installation. In various embodiments, the stylesheets 165 are maintained in non-volatile storage of the server 101, but are not included in the database 107.

In various embodiments, the database interface portion 170 can be configured to communicate with the database 107. For example, the database interface portion 170 can be configured to generate and output to the database 107 an information storage request or an information retrieval request. The information storage and information retrieval requests can be formatted in accordance with the Structured Query Language (SQL). Database requests from the database interface portion 170 can be received by the memory manager 109 of the database 107. In various embodiments, the database interface portion 170 can comprise a Java™ servlet.

In operation, in various embodiments, the translator portion 160 can be configured to receive input information and translate the input information, in accordance with translation instructions specified by one or more stylesheets 165, into output information to be stored in the database 107. In particular, the translator portion 160 can be configured to generate a key from an attribute occurring in the input information, the input information being formatted in accordance with a markup language. In various embodiments, the key can be an index key used for retrieving the output information from the database 107. A different key can be associated with each of many different types of attributes. In various embodiments, the attributes in the input information that are used by the translator portion 160 to generate the keys can be defined in one or more stylesheets 165. The stylesheets 165 can be customized to generate keys from a variety of attribute types according to the needs of the user. FIG. 8 is an example stylesheet 165 defining attributes to be used for keys according to various embodiments.

Furthermore, stylesheets 165 can be used to specify to the translator portion 160 the manner in which to add the keys to a hash table index. In various embodiments, the hash table index can comprise an internal database index.

With respect to FIG. 3, there is shown a database insertion functional flow diagram in accordance with various embodiments. As shown in FIG. 3, the translator portion 160 can receive the input information 301 and apply a first stylesheet 165 to generate keys based on occurrences of the attribute(s) specified in the first stylesheet 165. A key can comprise an identifier that serves to identify the information, such as markup language data or a tag, associated with the corresponding attribute. In various embodiments, the translator portion 160 can be configured to generate one such identifier for every occurrence of the corresponding attribute in the input information 301. Each such generated identifier can be included in the output information 302. Thus, the output information 302 generated by the translator portion 160 can comprise one or more of the identifiers, each of which each identifiers corresponds to an occurrence of the selected attribute(s) in the input information 301, each of which identifiers identifies the information associated with the attribute in the input information 301, and each of which identifiers is added or inserted into the database 107. In various embodiments, the output information 302 can comprise keys in a hash table index. A second stylesheet 165 can be used to specify to the translator portion 160 the manner in which to add the keys to a hash table index. The hash table index can comprise an internal database index.

In various embodiments, the database interface portion 170 can be configured to apply an insertion instruction page 303 to select insertion of the output information 302 into the database 107 as either a single document or file, or as several compressed documents or files. The insertion instruction page 303 can comprise a markup language file such as, for example, a HyperText Markup Language (HTML) page. The database interface portion 170 can then upload the input information 301 for insertion into the database 107. In various embodiments, the database interface portion 170 can comprise a Java™ servlet. The input information 301 can comprise XML formatted information. In various embodiments, the input information 301 can be compressed using a compression algorithm such as, for example, the java.util.zip Java™ compression utility of the Java™ 2 Platform Std. Ed. v 1.4.2 available from Sun Microsystems of Santa Clara, Calif. In various alternative embodiments, another ZIP compression algorithm can be used such as, for example, PKZIP available from PKWARE, Inc. of Milwaukee, Wis., or the WinZip™ product available from Microsoft Corporation.

Furthermore, in various embodiments, the translator portion 160 can be configured to generate multiple levels of identifiers. Each level of identifiers can be hierarchically related to another one of the levels (for example, the immediately preceding level or the immediately following level). In various embodiments, a top-level identifier can serve to identify an entire input information 301 file such as, for example, an XML file. Multiple sub-level identifiers can be provided, wherein each sub-level identifier serves to identify any XML in the input information 301 that meets the attribute criteria specified in the applicable stylesheet 165. Further, the translator portion 160 can be configured to index all of the identifiers, or keys, by associating each sub-level identifier with its immediately preceding (for example, next highest priority) sub-level identifier, and by associating each sub-level identifier with its top-level identifier.

Example input information 301 is set forth in Table 1 below. As shown in Table 1, the input information 301 can comprise an XML file.

TABLE 1 Input Information <?xml version=‘1.0’ encoding=‘ISO-8859-1’ ?> <task ID=“my.test” type=“merc”> <title><cdata>Tests my links</cdata></title> <objective><cdata>testing my links</cdata></objective> <subtask ID=“my.test.link”> <title><cdata>linktest</cdata></title> <step> </step> </subtask> <subtask ID=“my.test.run”> <title><cdata>run my test</cdata></title> <step> </step> </subtask> </task>

Upon receiving the input information 301 shown in Table 1, the translator portion 160 can apply the first stylesheet 165 to generate the identifiers. For example, if the stylesheet 165 specifies the “ID” attribute in the input information 301 to be used to generate identifiers, the translator portion 160 can generate one identifier for every occurrence of the “ID” attribute encountered in the input information 301. Each generated identifier is included in the output information 302. Thus, the output information 302 generated by the translator portion 160 can comprise one or more of the identifiers, each of which each identifiers corresponds to an occurrence of the selected attribute(s) in the input information 301, each of which identifiers identifies the information associated with the attribute in the input information 301, and each of which identifiers is added or inserted into the database 107.

In various embodiments, the output information 302 can comprise keys in a hash table index. A second stylesheet 165 can be used to specify to the translator portion 160 the manner in which to add the keys to a hash table index. The hash table index can comprise an internal database index. In various embodiments, the hash table index can be stored using the hash table 113 of the memory manager 109.

Example output information 302 is set forth in Table 2 below. As shown in Table 2, the output information 302 can comprise an XML file.

TABLE 2 Output Information ID = “my.test”, Top-level = “my.test” ID = “my.test.link”, Top-level = “my.test” ID = “my.test.run”, Top-level = “my.test”

With respect to FIG. 4, there is shown an illustrative mapping of attributes contained in the input information 301 to identifiers or keys in the hash table index of the output information 302 for the example input and output information of Tables 1 and 2 in accordance with the database insertion process 300. As shown in FIG.4, the “ID” attribute is specified by the stylesheet 165 for generating database identifiers, or keys. Thus, the translator portion 160 generates multiple levels of identifiers for occurrences of the “ID” attribute in the input information 301. In particular, the “ID” attribute for “my.test” is assigned as the top-level identifier, and the “ID” attributes for “my.test.link” and “my.test.run” are determined to be sub-level identifiers. As shown in FIG. 4, the sub-level identifiers for “my.test.link” and “my.test.run” are associated with the top-level identifier “my.test.” Therefore, the sub-level identifiers for “my.test.link” and “my.test.run” are hierarchically related to the top-level identifier “my.test.” The top-level identifier “my.test” serves to identify the entire input information 301 file, while the sub-level identifiers serve to identify XML in the input information 301 associated with the sub-level identifier. The hierarchically-related top-level identifiers and sub-level identifiers shown in the output information 302 of FIG. 4 can comprise a hash table index 113 useful for retrieving all or a portion of the input information 301 from the database 107. Thus, the input information 301 can be inserted into the database 107 by the database interface portion 170 in accordance with the insertion instruction page 303 as described with respect to FIG. 3, for example, as a compressed file.

After insertion into the database 107, the inserted document text, for example, markup language information of the input information 301, can be retrieved from the database 107 using the hash table index (for example, output data 302). With respect to FIG. 5, there is shown a database retrieval flow diagram in accordance with various embodiments. As shown in FIG. 5, upon receiving a database read request from the client device 102, the input/output portion 150 can forward the database read request to the database interface portion 170. In various embodiments, the client device 102 can submit a database read request comprising a specific identifier to be obtained from the database 107. For example, the database read request can comprise the identifier, “ID=‘my.test.link.’” It will be recalled from the previous example, “ID=‘my.test.link’” is a sub-level identifier that is hierarchically related to the top-level identifier ID=“my.test.” An example hash table index is shown in Table 3 below.

TABLE 3 Hash Table Index Key 1 ID = “my.test”, Top-level = “my.test” Key 2 ID = “my.test.link”, Top-level = “my.test” Key 3 ID = “my.test.run”, Top-level = “my.test” Key 4 ID = “hello.world”, Top-level = “hello.world” Key 5 ID = “justin.time”, Top-level = “justin.time” Key 6 ID = “outof.time”, Top-level = “justin.time”

Although six keys are shown in Table 3, it is to be understood that any number of keys can be included in the hash table index. The input/output portion 160 can forward the database read request to the database interface portion 170. Upon receiving the database read request, the database interface portion 170 can search the keys in the hash table index 113, via table look-up or other method, for the identifier contained in the database read request. For example, the database interface portion 170 can perform a table lookup of the keys in the hash table index 113 to determine that the second key in Table 3 corresponds to the specific identifier (“my.test.link”) contained in the example database read request. The database interface portion 170 can then form a database request using the sub-level identifier and top-level identifier located in the hash table index 113. The database interface portion 170 can then send the database request to the database 107.

In various embodiments, upon receiving the database request, the memory manager 109 of the database 107 can determine if the information corresponding to the identifier is contained in local memory 112 at the memory manager 109. If so, then the memory manager 109 can return the information (for example, XML) associated with the identifier in the database request to the database interface portion 170, without reading the information from the storage device 111. Because the local memory 112 has a faster access time latency than the storage device 111, storing information locally using the memory manager 109 reduces the access time to the client device 102 to obtain the requested information.

If the requested information is not contained in memory manager 109 local memory 112, then the memory manager 109 performs a database read operation to obtain the requested information from the storage device 111. The memory manager 109 also can add the information read from the storage device 111 to a hash table contained in local memory 112, for faster access to the information in response to subsequent requests for it. In various embodiments, the information obtained from the database can comprise the entire file or entire amount of information associated with the top-level identifier. For example, for the located key “ID=‘my.test.link’, Top-level=‘my.test’” will result in the database 107 returning the entire file (for example, XML document) associated with the “my.test” top-level identifier.

In various embodiments, upon receiving the information from the database 107, the database interface portion 170 can forward the received information to the translator portion 160. The translator portion 160 can apply a third stylesheet 165 parses the information received from the database to strip out unwanted information prior to presenting or outputting the information to the client device 102. For example, for the database access request comprising the sub-level identifier “my.test.link,” the translator portion 160 can remove all but the following information as shown in Table 4:

TABLE 4 Transformed Database Output Information <subtask ID=“my.test.link”> <title><cdata>linktest</cdata></title> <step> </step> </subtask>

In this case, for information flowing from the database to the client device, the information obtained from the database 107 can comprise information input to the translator portion 160, and the transformed information provided to the client device can comprise information output by the translator portion 160. The transformed database output information can then be forwarded to the input/output portion 150 and transferred to the client device 102 for further processing such as, for example, display to a user.

Therefore, unlike other databases available for maintaining markup language information, various embodiments comprising a system and method for inserting document text into a database and for retrieving portions of the document text from that database as described herein can provide, among other things, improved speed and efficiency in indexing and searching of information as well as improved speed of information retrieval from a database, because only the desired data is transferred to the requesting device. Further, various embodiments can be implemented using a relatively small number of instructions compared to other systems. While other databases use XPATH mechanisms to extract markup language from a database, various embodiments use unique keys created from attribute names to identify and obtain information from a database. In addition, various embodiments comprising the customized stylesheets allow the user the capability to customize how information is parsed into the database and also how information is displayed to the user.

With respect to FIGS. 9A and 9B, there is shown an example third stylesheet 165 used to obtain the requested XML according to various embodiments. In the example shown in FIGS. 9A and 9B, the stylesheet 165 can cause the translator portion 160 to be configured to obtain the top level XML associated with a top-level identifier or subtask level XML associated with a child identifier. In various embodiments, the third stylesheet 165 can be a .xsl file.

With respect to FIGS. 10A and 10B, there is shown another example third stylesheet 165 according to various embodiments. In the example shown in FIGS. 10A and 10B, the stylesheet 165 can cause the translator portion 160 to be configured to obtain the XML associated with a particular language based on a chosen locale. For example, if information is stored in the database 107 in three different languages (such as, for example, English, French and German), the stylesheet 165 can cause the translator portion 160 to obtain only the French version, if the user requested the French version and the French version is available. The XML for other locales is removed from the information provided to the requesting client device 102.

With respect to FIGS. 11A and 11B, there is shown yet another example third stylesheet 165 according to various embodiments. In the example shown in FIGS. 11A and 11B, the stylesheet 165 can cause the translator portion 160 to be configured to obtain the XML associated with a particular item of equipment or version of equipment. For example, if information is stored in the database 107 for different versions of a document, the stylesheet 165 can cause the translator portion 160 to obtain the latest version.

In various embodiments, the stylesheets 165 of FIGS. 10A and 10B, and 11A and 11B, can be applied after the stylesheet 165 of FIGS. 9A and 9B obtains the appropriate XML. In various embodiments, for processing of any stylesheet 165, elements encountered during translation that do not contain the requested attributes, or that do not match, can be returned to the requesting client device 102.

With respect to FIGS. 6A and 6B, there is shown a database insertion method 600 according to various embodiments. As shown in FIG. 6A, the database insertion method 600 can commence at 601. The method can proceed to 603, at which the user selects a file for database insertion. The selection can be accomplished, for example, by entering a file identifier such as, for example, a file name, into a data entry field of an interactive page at the client device 102. The interactive page can comprise an HTML page, for example. The user can cause the client device 102 to send the file to the database servlet (for example, the information storage and retrieval application 105) at the server 101 by actuating a button provided on the interactive page. Upon user actuation of the upload command or button, the client device 102 can transfer the file to the servlet, at 605.

Control can then proceed to 607, at which the file for database insertion can be received by the input/output portion 150 of the database servlet. Upon recognizing a file for database insertion, the input/output portion 150 can forward the file to the translator portion 160. Control can then proceed to 609, at which, upon receiving the input information (for example, the file for database insertion), the translator portion 160 can select the first stylesheet 165. In various embodiments, the first stylesheet 165 can be retrieved from a memory of the server 101 or using the network 103. Control can then proceed to 611, at which the translator portion 160 can apply the first stylesheet 165 to the received input information to generate a key for each occurrence of one of the attributes to in the input information specified in the first stylesheet 165. In various embodiments, the key can comprise one or more identifiers. Control can then proceed to 613, at which the translator portion can construct a hierarchy of related identifiers as the keys are generated. In various embodiments, the keys can comprise, for example, a first sub-level identifier and another identifier that is the immediately preceding level identifier to which the first sub-level identifier belongs. Control can proceed to 615, at which the translator portion 160 can determine if the end of the input information has been reached (for example, end of file). If not, then control can return to 611 to search for the next attribute in the input information selected by the first stylesheet 165, until keys have been generated for all matching attributes found in the input information.

Control can then proceed to 617, at which the translator portion 160 can select the second stylesheet 165. In various embodiments, the second stylesheet 165 can be retrieved from a memory of the server 101 or using the network 103. Referring to FIG. 6B, control can then proceed to 619, at which the translator portion 160 can generate the output information 302 using the identifiers and keys determined at 611 and 613 in accordance with the second stylesheet 165. In various embodiments, the output information 302 can comprise a hash table index 113. Control can then proceed to 621, at which the translator portion 160 can store the hash table index 113 in memory manager 109 local memory 112.

Control can then proceed to 623, at which the database interface portion 170 can retrieve the insertion instruction page 303 from the database 107. The insertion instruction page 303 can comprise a markup language file such as, for example, a HyperText Markup Language (HTML) page. Control can then proceed to 625, at which the database interface portion 170 can apply the insertion instruction page 303 to select the insertion mode for adding the input information 301 into the database 107. Control can proceed to 627, 629, or 631 for insertion of the input information 301 into the database 107 in accordance with the insertion instruction page 303. For example, at 627, the database interface portion 170 can format the input information 301 for insertion into the database 107 without using any compression. Alternatively, at 629, the database interface portion 170 can format the input information 301 for insertion into the database 107 by performing data compression of the input information 301 as a single document. In various embodiments, the input information 301 can be compressed using a compression algorithm such as, for example, the java.util.zip compression utility. Alternatively, at 631, the database interface portion 170 can format the input information 301 for insertion into the database 107 by performing data compression of the input information 301 as multiple distinct files. For example, if the input information 301 is received as a single ZIP file, then the database interface portion 170 can unzip the ZIP file and insert individually each compressed file that is included in the ZIP file. In various embodiments, the database insertion portion 170 can be configured to insert the input information 301 into the database 107 using the METHOD=“POST” HTML instruction.

Control can then proceed to 633, at which the database interface portion 170 can store in, or upload to, the database 107, the input information 301 from 629 or the compressed input information 301 from 631 or 633 as either a single document or file, or as several compressed documents or files.

With respect to FIGS. 7A and 7B, there is shown a database retrieval method 700 according to various embodiments. As shown in FIG. 7A, the database retrieval method 700 can commence at 701. The method can proceed to 703, at which the client device 102 prepares and sends a database read request to the server 101. In various embodiments, the client device 102 can prepare and send the database read request in response to receiving a request for information from, for example, an application or in response to a user request received via user interface. In various embodiments, the client device 102 can submit the database read request comprising a specific identifier to be obtained from the database 107. For example, the database read request can comprise the sub-level identifier, “ID=‘my.test.link,’ or other specific identifier to be obtained from the database 107.

The method can then proceed to 705, at which, at which the information storage & retrieval application (for example, database servlet) can receive the database read request from the client device 102. In particular, upon receiving a database read request from the client device 102, the input/output portion 150 can forward the database read request to the database interface portion 170. For example, the database read request can comprise the sub-level identifier, “ID=‘my.test.link.’ The input/output portion 160 can forward the database read request to the database interface portion 170.

Control can then proceed to 707, at which, upon receiving the database read request, the database interface portion 170 can search the keys in the hash table index 113, via table look-up or other method, for the identifier contained in the database read request. For example, the database interface portion 170 can perform a table lookup of the keys in the hash table index 113 to determine the key that corresponds to the specific identifier contained in the database read request. Control can then proceed to 709, at which the database interface portion 170 can determine if the hash table index 113 contains keys matching the specific identifier contained in the database read request. If not, control can proceed to 711, at which the database interface portion 170 can send (via the input/output portion 150) an error message to the client device 102 indicating no matching entry in the database 107. In various embodiments, the error message can comprise an HRTP response indicating request failure.

If a key is located within the hash table index, then control can then proceed to 713, at which the database interface portion 170 can form a database request using the sub-level identifier, if received, and top-level identifier located in the hash table index 113, and then send the database request to the database 107.

Control can then proceed to 715, at which, upon receiving the database request, the memory manager 109 of the database 107 can determine if the information corresponding to the identifier is contained in local memory 112 at the memory manager 109. If so, then control can proceed to 717, at which the memory manager 109 can return the information (for example, XML) associated with the identifier in the database request to the database interface portion 170, without reading the information from the storage device 111. Because the local memory 112 has a faster access time latency than the storage device 111, storing information locally using the memory manager 109 reduces the access time to the client device 102 to obtain the requested information.

If the requested information is not contained in memory manager 109 local memory 112, then control can proceed to 719, at which the memory manager 109 performs a database read operation to obtain the requested information from the storage device 111. In various embodiments, the information obtained from the database storage device 111 can comprise the entire file or entire amount of information associated with the top-level identifier. For example, for the located key “ID=‘my.test.link’, Top-level=‘my.test’” will result in the database 107 returning the entire file (for example, XML document) associated with the “my.test” top-level identifier.

Control can then proceed to 721, at which, upon receiving the information from the database 107, the database interface portion 170 can forward the received information to the translator portion 160 and the translator portion 160 can apply a third stylesheet 165 parses the information received from the database to strip out unwanted information prior to presenting or outputting the information to the client device 102, such that the transformed information returned to the client device 102 is only the information associated with the selected sub-level identifier, and not the remaining information in the document stored in the database. Therefore, only the information needed by the client device 102 is actually transferred to the client device 102, resulting in more efficient and timely responses to database requests. In various embodiments, the translator portion 160 can be configured to perform an XSL translation that results in only pertinent data being obtained. For example, the translator portion 160 can be configured to extract information by identifier and by attributes passed to the database 107. Values that do not agree with the attributes can be removed. Elements that do not contain the attributes or match can be passed back to the client.

Referring to FIG. 7B, control can then proceed to 723, at which the memory manager 109 also can add the transformed information to the hash table in local memory 112, for faster access to the information in response to subsequent requests for it. Upon adding the transformed information to the hash table, control can then proceed to 725, at which the memory manager 109 can determine whether or not looping timeframe is met. For example, the memory manager 109 can maintain a counter that is incremented each time information is added to the hash table. Upon the counter reaching a predetermined number, for example, a parameter specifying the number of iterations or “loops” to occur before setting the looping timeframe to an active state, then control can proceed to 727, at which the memory manager 109 can determine if the local memory 112 size has exceeded a target threshold size. If the looping timeframe is not set (for example, the number of iterations has not yet been reached), then control can proceed to 731.

If at 727 the memory determines that the local memory 112 size has exceeded the target threshold size, then control can proceed to 729, at which the memory manager 109 can remove the oldest information in local memory 112 to provide capacity to store the transformed information and maintain the size of the local memory 112 below the target threshold size. In various embodiments, the target threshold is configurable and can be modified by, for example, updating an input parameter specifying the target size threshold contained in a configuration file.

Control can then proceed to 731, at which the input/output portion 150 can send the transformed database information to the client device 102 for further processing such as, for example, display to a user. In various embodiments, the transformed information obtained from the database can be output to the client device 102 as an HTTP response. Control can then proceed to 733, at which the method can end.

Thus has been disclosed a system and method for inserting document text into a database and for retrieving portions of the document text from that database. The system and method can provide, among other things, improved speed and efficiency in indexing and searching of information as well as improved speed of information retrieval from a database, because only the desired data is transferred to the requesting device.

Various embodiments can be implemented using hardware and software components including the PC and related peripherals as described herein. However, it is further apparent to those skilled in the art that the disclosed system may be readily implemented in software using object or object-oriented software development environments that provide portable source code that can be used on a variety of computer platforms. Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits or a VLSI design. Other hardware or software can be used to implement the systems in accordance with this invention depending on the speed and/or efficiency requirements of the systems, the particular function, and/or a particular software or hardware system, microprocessor, or microcomputer system being utilized. The system and method herein can be readily implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the functional description provided herein and with a general basic knowledge of the computer and mark-up language arts.

Moreover, the disclosed methods may be readily implemented in software executed on programmed general-purpose computer, a special purpose computer, a microprocessor, or the like. In these instances, the systems and methods of this invention can be implemented as program embedded on personal computer such as Java™ or CGI script, as a resource residing on a server or graphics workstation, as a routine embedded in a dedicated encoding/decoding system, or the like. The system can also be implemented by physically incorporating the system and method into a software and/or hardware system, such as the hardware and software systems of an image processor.

While embodiments of the invention have been described above, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the applicable arts. Accordingly, the embodiments of the invention, as set forth above, are intended to be illustrative, and should not be construed as limitations on the scope of the invention. Various changes may be made without departing from the spirit and scope of the invention. Accordingly, the scope of the present invention should be determined not by the embodiments illustrated above, but by the claims appended hereto and their legal equivalents.

Claims

1. A method for processing information from a document using a database, comprising the steps of:

associating a tag with one of a plurality of identifiers;

generating a hash table comprising a plurality of levels, wherein each one of said plurality of levels is hierarchically related to another one of said plurality of levels, and wherein each one of said plurality of levels is associated with one of said plurality of identifiers;

receiving the document comprising text formatted in accordance with a first markup language;

determining each occurrence of one of the plurality of identifiers within the text by searching the text for a text stream that matches the identifier using a stylesheet; and

generating a hash table index comprising at least one key, wherein the at least one key comprises at least one of the plurality of identifiers.

2. The method of claim 1, further comprising:

retrieving the tag associated with one of the plurality of identifiers at one or more of the plurality of levels.

3. The method of claim 2, wherein retrieving the tag further comprises:

retrieving the tag from a local memory of the database if the tag is stored therein; and

retrieving the tag from a hard disk of the database if the tag is not stored in the local memory.

4. The method of claim 1, wherein the tag comprises text formatted in accordance with the first markup language.

5. The method of claim 4, wherein the first markup language is extensible markup language (XML).

6. The method of claim 5, further comprising:

selecting the at least one identifier; and

modifying the stylesheet to use the at least one selected identifier.

7. The method of claim 5, wherein the plurality of levels comprises one or more top levels and at least one sublevel associated with each one of the top levels.

8. The method of claim 7, further comprising:

associating the document with one of the top levels.

9. The method of claim 1, further comprising:

performing data compression on the document text to form a compressed document.

10. The method of claim 9, further comprising:

inserting the document text into the local memory and the hard disk of the database according to an insertion page comprising database insertion instructions provided in accordance with a second markup language.

11. The method of claim 10, further comprising:

determining whether or not the local memory size has exceeded a target threshold size; and

removing, if the target threshold size has been exceeded, the oldest information from the local memory to maintain the local memory size below the target threshold size.

12. A system for processing information using a database, comprising:

an information storage and retrieval application configured to receive markup language information and database requests from a client device and further comprising a translator portion configured to generate a key based on each occurrence of a selected attribute occurring in a file, the selected attribute being specified using a first stylesheet; and

a database coupled to the information storage and retrieval application and further comprising a memory manager.

13. The system of claim 12, wherein the memory manager further comprises:

a local memory including a hash table index and a hash table;

wherein the translator portion is configured to form the key using at least one identifier associated with the selected attribute and to add one or more keys to the hash table index in accordance with a second stylesheet.

14. The system of claim 13, wherein the at least one identifier comprises a top-level identifier and at least one sub-level identifier, and wherein the at least one sub-level identifier and the top-level identifier are hierarchically related.

15. The system of claim 14, wherein the top-level identifier is associated with a document comprising input information, and wherein each of the at least one sub-level identifiers is identified with a portion of the input information.

16. The system of claim 13, wherein the translator portion is further configured to insert input information into the database in accordance with an insertion instruction page, and to transform information received from the database in accordance with a third stylesheet.

17. A computer-readable medium upon which is embodied a sequence of programmable instructions which when executed by a processor cause the processor to perform functions comprising:

receiving a document comprising text formatted in accordance with a first markup language;

associating a tag with one of a plurality of identifiers, wherein the tag comprises document text;

generating a hash table comprising a plurality of levels, wherein each one of said plurality of levels is hierarchically related to another one of said plurality of levels, and wherein each one of said plurality of levels is associated with one of said plurality of identifiers;

determining each occurrence of one of the plurality of identifiers within the text by searching the text for a text stream that matches the identifier using a stylesheet;

generating a hash table index comprising at least one key, wherein the at least one key comprises at least one of the plurality of identifiers;

retrieving the tag associated with one of the plurality of identifiers at one or more of the plurality of levels, further comprising retrieving the tag from a local memory of a database if the tag is stored therein and retrieving the tag from a hard disk of the database if the tag is not stored in the local memory;

associating the document with one of the top levels;

performing data compression on the document text to form compressed document text; and

inserting the compressed document text into the local memory and the hard disk of the database according to an insertion page comprising database insertion instructions provided in accordance with a second markup language.

18. The computer-readable medium of claim 17, wherein the instructions further comprise:

determining whether or not the local memory size has exceeded a target threshold size; and

removing, if the target threshold size has been exceeded, the oldest information from the local memory to maintain the local memory size below the target threshold size.

19. The computer-readable medium of claim 17, wherein the plurality of levels comprises one or more top levels and at least one sublevel associated with each one of the top levels.

20. The computer-readable medium of claim 17, wherein the tag comprises text formatted in accordance with the first markup language, and wherein the first markup language is extensible markup language (XML).