FLEXIBLE FULLY INTEGRATED REAL-TIME DOCUMENT INDEXING

Info

Publication number: 20120089612
Type: Application
Filed: Sep 28, 2011
Publication Date: Apr 12, 2012
Applicant: NOLIJ CORPORATION (Beverly, MA)
Inventors: John J. Collins (South Hamilton, MA), Sean J. Langford (Middleton, MA)
Application Number: 13/247,536

Abstract

A system for real-time document indexing is provided that includes a browser that is executing on a client system. The browser includes functionalities allowing it to communicate with a remote computer system. A query interface executes within the framework of the browser. The query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system. The remote computer system sends to the query interface the results of the one or more query searches via the browser. The query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier. The query interface indexes the results of the one or more query searches to the unique identifier of the folder.

Description

Description

PRIORITY INFORMATION

This application claims priority from provisional application Ser. No. 61/392,252 filed Oct. 12, 2010, which is incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

The invention is related to field of network applications, and in particular to a browser-based document searching and indexing system.

Traditional document management, requiring extensive use of locally attached scanning devices, file servers, databases, and the like, has long been rooted in the technology of yesterday: namely, Windows-based client/server technology. A new era of computing is currently being ushered in; one that revolves around portability, cross platform support, and a new and growing installed base of mobile devices driven by Google, the resurgent Apple Inc. and others. Leveraging this new and increasingly adopted technology environment requires redesigning and, in many cases, reinventing wheels that no longer fit. Unfortunately, the enterprise document management space is largely tied to legacy client/server architecture. While most vendors offer a lightweight web client with limited functionality—to present a veneer of modernity—the bulk of the underlying technology is built on traditional Windows-based client server architecture. As Java technology continues to evolve and consume a greater percentage of development projects worldwide, more and more leading software companies are leveraging its portability and multi-platform capabilities to prepare for the next-generation Internet. The next generation Internet is one based on portability, open standards, and mobile device support. The ability to provide full document management functionality in the absence of traditional client/server technology while supporting any web browser on any platform along with a wide range of mobile devices presents a number of unique challenges.

SUMMARY OF THE INVENTION

According to one aspect of the invention, there is provided a system for real-time document indexing. The system includes a browser that is executing on a client system. The browser includes functionalities allowing it to communicate with a remote computer system. A query interface executes within the framework of the browser. The query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system. The remote computer system sends to the query interface the results of the one or more query searches via the browser. The query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier. The query interface indexes the results of the one or more query searches to the unique identifier of the folder.

According to another aspect of the invention, there is provided a method of performing real-time document indexing. The method includes providing a browser that is executing on a client system. The browser includes functionalities allowing it to communicate with a remote computer system. Also, the method includes executing within the framework of the browser a query interface. The query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system. The remote computer system sends to the query interface the results of the one or more query searches via the browser, the query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier. The query interface indexes the results of the one or more query searches to the unique identifier of the folder.

According to another aspect of the invention, there is provided a computer readable medium for storing a program being executed on a computer system. The programs performs a method of performing real-time document indexing. The method includes providing a browser that is executing on a client system. The browser includes functionalities allowing it to communicate with a remote computer system. Also, the method includes executing within the framework of the browser a query interface. The query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system. The remote computer system sends to the query interface the results of the one or more query searches via the browser, the query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier. The query interface indexes the results of the one or more query searches to the unique identifier of the folder.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram illustrating the inventive integrated real-time document searching and indexing system;

FIG. 2 is a process flow illustrating of the full lifecycle of the real-time database searching process used in accordance with the invention; and

FIG. 3 is a process flow illustrating the full lifecycle of the real-time database indexing process used in accordance with the invention.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides a novel database searching and indexing process. The invention uses a query interface by which folders and documents are retrieved for viewing and indexing. The query interface retrieved records from an external database in real-time. Real-time database retrieval allows documents and associated metadata to be returned and utilized without any worry about data latency—a common problem with other document imaging/document management solutions that create a copy of external data in a local ‘imaging’ database for searching purposes.

FIG. 1 is a schematic diagram illustrating the inventive integrated real-time document searching and indexing system 2. The document searching and indexing system 2 utilizes a client browser 4 that is executing on a client system 40. The client system 40 includes a processor, a portable storage, and processor storage medium. The client browser 4 within its framework runs a browser based query interface 6. The query interface 6 can be applet or other browser based application running within the client browser's 4 framework. The query interface 6 can also be written in a platform independent programming language, such as Java or the like. The document indexing system 2 also includes a server system 8, which executes a query engine 32. The server system 8 includes a processor, a portable storage, and processor storage medium. The query engine 32 can be a servlet or other server based application running on the server system 8. The query engine 32 can also be written in a platform independent programming language, such as Java or the like.

When an end-user is prepared to use the query interface 6, the end-user must be authenticated by logging in to the query interface 6. Once the end-user is properly authenticated the query interface 6 sends a request 14 to the server 8 notifying the server 8 for a connection. Client system 40 and server 8 communicate using commonly known client-server communication protocols such TCP/IP or the like. In other embodiments of the invention, there are a plurality of client systems communicating with the server 8. Once a connection between the client system 40 and the server 8 has been established, the query engine 32, via a message 16, sends all assigned and configured queries to the query interface 6.

The end-user can enter a search criteria using the query interface 6 into one or more provided fields 28, such as name, date of birth (DOB), invoice, or the like, in a currently active query. The query interface 6 allows for multiple queries to be performed. Also, the query interface 6 allows for other functionalities like fully supporting all-database supported wildcards. The end-user executes the query within the query interface 6 by pressing enter or issuing another command, which activates the query.

Moreover, the query interface 6 prepares the activated query for transmissions to the server 8. The query interface 6 parses all the search fields 28 and collects them and formulates a message 18 having the query contents and search fields 28 for processing. The query interface 6 sends the message 18 to the server 8. The server 8 initiates a number of threads to process the message 18. The server 8 processes the message 18 by formulating based on the contents of the message 18 a SQL statement. The query engine 32 is provided the SQL statement for further processing.

The system 2 includes a metadata storage module 10 and an enterprise document repository 12. The metadata storage module 10 and the enterprise document repository 12 are database structures that are remotely connected to the server 8. Given the query is an autonomous search object, it can be connected to almost any external SQL database. Once the query engine 32 determines which of the databases structures 10, 12 to send the SQL statement, the server 8 establishes the remote connection with the respective database structure 10, 12. When the remote connection with the respective databases 10, 12, the server 8 sends via message 34 or 36 the SQL statement for processing by the respective databases 10, 12. In other embodiments of the invention, a collection of queries can be used to provide access to a number of different databases using a single seamless search interface.

After the execution of the SQL statement by either of the databases 10, 12, the results of the execution are managed and collected by the query 32 engine using the messages 34, 36. Once all the results are provided to the query engine 32, the query engine 32 packages the results in a message 20. The server 8 sends the message 20 to the query interface 6 via the client computer system 40 to the client browser 4. The query interface 6 provides to the end-user the results by displaying it in a fashion understandable to the end-user, such as a sortable list that is simple to view for browsing purposes.

Moreover, the invention allows for real-time indexing without requiring extensive work to be done by the end-user. The invention allows for simplified indexing using external database records retrieved via real-time searching where all metadata is maintained in their original database without copying the information to a third party database or location. This approach eliminates the common problem of data latency and synchronization.

When a search or query results are provided by the server 8 to the query interface 6 for display, the query interface 6 places these results in a display folder 24 that includes the results 26. The results 26 can include actual documents retrieved from the enterprise document repository 12 or its respective metadata from the metadata store 10. The end-user can select the folder 24 to view as well as index the contents 26 of the folder 24.

Based on the end-user needs, documents to be indexed can be a hardcopy or they may not be a hardcopy in which case the file must be uploaded. Once a hardcopy of a document is accessible scanning operations are performed on the hardcopy of the document. The query interface 6 receives confirmation of the completed scanning process and displays the confirmation to the end-user confirming the completion of the indexing process. Moreover, the query interface 6 attaches a unique ID or key value to the folder 24 where all processed documents are now indexed to their respective target folder unique ID. The key aspect of the invention is the fact that one or more documents can be automatically indexed to whatever unique ID or key value associated to a real-time database search query by simply associating these documents to a folder. It is noted any digital copy of a document can be indexed in accordance with the invention.

The real-time database searching process is perhaps the most central of the embedded processes as it forms the backbone of virtually all processing tasks within the query interface 6. The query interface 6 is the primary mechanism by which folders and documents are retrieved for viewing and other purposes. The greatest advantage of the query interface 6 is its ability to retrieve records from an external database in real-time. Real-time database retrieval allows documents and associated metadata to be returned and utilized without any worry about data latency—a common problem with other document imaging/document management solutions that create a copy of external data in a local ‘imaging’ database for searching purposes.

Also, the invention allows an end-user to perform secondary indexing based on document type in addition to the assigned folder unique ID. If an end-user prefers secondary indexing, one or more documents are selected from the contents lists using the query interface 6. The query interface 6 can perform secondary indexing on one or more documents, which save a considerable amount of time.

There are two ways to perform secondary indexing: first is manually and second is a pick list. The query interface 6 provides the specific user interfaces to allow a user to performing secondary indexing. While performing manual secondary indexing, the query interface provides the end-user the ability to key document types from a displayed list of value by left-clicking a mouse on the selected group of documents. When performing the pick list secondary indexing an end-user can right-mouse click on the selected group of documents to display a pre-defined list of document types. The query interface 6 provides the user interface components to aid the end-user in this process.

FIG. 2 is a process flow 50 illustrating of the full lifecycle of the real-time database searching process used in accordance with the invention. As with most embedded functionality, an end-user must log into the query interface 6 to begin working with documents and associated metadata, as shown in step 51. Each end-user can have any number and type (e.g. folder level, document specific, batch processing) of search queries depending on the nature and type of searching that particular user performs. Once authenticated, the query interface 6 delivers and renders all assigned and configured search queries to the connected end-user, as shown in step 52.

To begin with, an end-user enters search criteria into the one or more of the available fields 28 (e.g. name, DOB, invoice number, etc.) in the currently active query, as shown in step 53. If multiple queries are available, the end-user simply selects any field from any displayed query to activate that query for searching (inactive queries are grayed out). The query interface 6 fully supports all database-supported wildcards including partial searches, Boolean logic, date ranges and more. A query is executed by user selection within the query interface 6—typically in the form of the Return (Mac) or Enter (Windows) key, as shown in step 54.

Once a query has been activated, the query interface 6 must create a Data Transfer Object (DTO) request package for delivery to the server 8. The first step in processing a query submission is to loop and parse all search fields in the active query, as shown in step 55. The query interface 6 checks for more search fields, as shown in step 56, and if one is found, the field ID (unique field identifier) and associated search criteria (e.g. value, partial value, wildcard, etc.) are collected, as shown in step 57. The collected field ID and data are added to a DTO request package, as shown in step 58, and the process continues until all search fields have been parsed. Once the last search field has been processed and added to the DTO request package, the DTO request package is submitted to the server 8 for execution, as shown in step 59. The server 8 receives the DTO request package and initiates a thread to process the request, as shown in step 60. The multi-threaded server 8 is designed to receive and process any number of search requests in parallel to ensure consistent response time across all system end-users.

The first step in processing the search DTO request package is to parse the included query fields, substitute all search criteria where required, and generate a master SQL statement based on the query fields, as shown in step 61. Note that each query field ID points back to a master table that also identifies the table and schema as well as any foreign and primary key relationships. Once the finished SQL statement is ready for execution, the server 8 opens the assigned remote database connection using the query engine 32, unique database connection string allowing privileged access to tables and columns within that database, as needed for the current query, as shown in step 62. Since each query is an autonomous search object, capable of connecting to virtually any external SQL database 10 or 12, a collection of queries can be used to provide access to a variety of different databases 10, 12 using the query engine 32. Once the connection is established, the server 8 passes the SQL statement to the target database for execution, as shown in step 63. After the SQL statement is executed, the results of the SQL execution are collected and added to the DTO results package using the query engine 32, as shown in step 64. Once the DTO results package build process is complete, the server 8 returns the DTO results package to the originating query interface 6, as shown in step 65, which displays the query results as a sortable list designed for simplified browsing (e.g. Windows-like folder tree), as shown in step 66.

Indexing is one of the most time consuming and labor intensive part of the document imaging process. Indexing is a process whereby documents are assigned metadata ‘tags’—discreet pieces of information which are then used to retrieve those documents via the search process. The novel aspect of the invention is simplifying indexing using external database records retrieved via real-time searching without manual intervention and all indexing metadata is maintained in its original (database) location—without the need to copy any of that data to a separate imaging database as other solutions do. Using real-time database indexing avoids the common problems of data latency and synchronization—inescapable attributes of stand-alone imaging metadata databases.

The real-time database indexing process is built upon the real-time database searching process. For database indexing to occur, one or more search results must be returned via the database searching process, as discussed for FIG. 2. FIG. 3 is a process flow 70 illustrating the full lifecycle of the real-time database indexing process used in accordance with the invention.

As described above, for indexing to commence, at least one folder 24 must be returned via the real-time database searching process, as shown in step 71. After the search results have been returned and displayed by the query interface 6, an end-user can select any displayed folder for viewing or indexing, as shown in step 72. Once selected, the folder's contents 26 are displayed and various additional pieces of functionality are enabled (e.g. live data form, scanning/file uploading, drag/drop, etc.). An important yet often overlooked part of the indexing process is document preparation, as shown in step 73. More document preparation is generally required for scanning (e.g. unfolding, staple removal, pre-ordering, etc.) while file uploading and drag/drop typically only requires locating and organizing multiple documents into a single folder for processing.

If the documents to be indexed are not hardcopy documents, as shown in step 74—existing electronic documents instead, then the file uploading process must be completed as detailed in U.S. patent application Ser. No. (Nolij 9375), which is incorporated herein by reference in its entirety, as shown in step 75. If instead, hardcopy documents are to be indexed, then the scanning process must be completed as detailed in U.S. patent application Ser. No. 13/156,426 (Nolij 9249), which is incorporated herein by reference in its entirety, as shown in step 76. Once the hardcopy or existing electronic documents are captured, the query interface 6 receives confirmation of the completed process and displays a confirmation dialog with details, as shown in step 77, confirming that all processed documents are now indexed to the target folder unique ID, as shown in step 78. This is the heart of the indexing process—namely that documents are automatically indexed to whatever unique ID, or key value, is tied to the real-time database search query by simply adding those documents to a folder.

Following the initial indexing phase (either scanning or file uploading), an optional secondary indexing process is available, as shown in step 79. Secondary indexing provides the ability to index by document type—a common designator denoting a collection of like documents (e.g. application, invoice, check, etc.)—in addition to the previously assigned folder ID. If secondary indexing is utilized, one or more documents must be selected from the folder contents 26 list, as shown in step 80. Secondary indexing can be applied to individual or multiple documents as needed; which is a noticeable time saver when capturing a group of similar documents.

Secondary indexing can be implemented using two primary methods: manual and pick list. Manual secondary indexing involves keying in a document type while pick list secondary indexing involves selecting a pre-defined document type from a displayed list of values. If manual secondary indexing is utilized, as shown in step 81, an end-user can left-mouse click on the selected group of documents to activate over-type mode via pop-up dialog, as shown in step 82. Into this dialog, the end-user can type a document name by hand, as shown in step 83. If pick-list indexing is utilized, an end-user can right-mouse click on the selected group of documents to display a pre-defined list of document types, as shown in step 84. Selecting one of the items in the list, as shown in step 85—similar to completing the manual entry process—applies the new document name to all selected documents, as shown in step 86, and the indexing process is complete, as shown in step 87.

The invention provides a novel database searching and indexing process. The invention uses a query interface by which folders and documents are retrieved for viewing and indexing. The invention allows real-time indexing without producing latency and other common issues associated with indexing, in particular assigning search results to folders and indexing the search results to the unique ID of the folder. In this way, metadata information and other database related operations are not effected.

Although the present invention has been shown and described with respect to several preferred embodiments thereof, various changes, omissions and additions to the form and detail thereof, may be made therein, without departing from the spirit and scope of the invention.

Claims

1. A system for real-time document indexing comprising:

a browser that is executing on a client system, the browser includes functionalities allowing it to communicate with a remote computer system; and

a query interface that is executing within the framework of the browser, the query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system, the remote computer system sends to the query interface the results of the one or more query searches via the browser, the query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier, the query interface indexes the results of the one or more query searches to the unique identifier of the folder.

2. The system of claim 1, wherein the query interface displays the contents of the folder.

3. The system of claim 1, wherein query interface provides secondary indexing based on document types.

4. The system of claim 1, wherein the unique identifier comprises a unique key value

5. The system of claim 1, wherein the query interface determines if there is a digital copy of the results of the one or more query searches.

6. The system of claim 5, wherein the query interface arranges to upload a digital copy if there is no digital copy of the results of the one or more query searches.

7. The system of claim 5, wherein the remote computer comprises a query engine that executes the one or more query searches sent by the query interface.

8. The system of claim 1, wherein the remote computer executes a plurality of threads once receiving the one or more query searches from the query interface.

9. A method of performing real-time document indexing comprising:

providing a browser that is executing on a client system, the browser includes functionalities allowing it to communicate with a remote computer system;

executing within the framework of the browser a query interface, the query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system, the remote computer system sends to the query interface the results of the one or more query searches via the browser, the query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier, the query interface indexes the results of the one or more query searches to the unique identifier of the folder.

10. The method of claim 9, wherein the query interface displays the contents of the folder.

11. The method of claim 9, wherein query interface provides secondary indexing based on document types.

12. The method of claim 9, wherein the unique identifier comprises a key value

13. The method of claim 9, wherein the query interface determines if there is a digital copy of the results of the one or more query searches.

14. The method of claim 9, wherein the query interface arranges to upload a digital copy if there is no digital copy of the results of the one or more query searches.

15. The method of claim 9, wherein the remote computer comprises a query engine that executes the one or more query searches sent by the query interface.

16. The method of claim 9, wherein the remote computer executes a plurality of threads once receiving the one or more query searches from the query interface.

17. The method of claim 12, wherein the unique key value is derived from hashing or concatenation of various data elements.

18. The method of claim 17, wherein the unique identifier value is used to aggregate documents for retrieval or reporting.

19. The system of claim 18, wherein the documents inherit all of the associated data elements stored in an external relational system having the unique identifier.

20. A computer readable medium for storing a program being executed on a computer system, the programs performs a method of performing real-time document indexing, said method comprising:

providing a browser that is executing on a client system, the browser includes functionalities allowing it to communicate with a remote computer system;

executing within the framework of the browser a query interface, the query interface receives one or more query searches from an end-user and sends the one or more query searches to be processed by the remote computer system, the remote computer system sends to the query interface the results of the one or more query searches via the browser, the query interface assigns the results of the one or more query searches to a folder where the folder includes a unique identifier, the query interface indexes the results of the one or more query searches to the unique identifier of the folder.

21. The computer readable medium of claim 20, wherein the query interface displays the contents of the folder.

22. The computer readable medium of claim 20, wherein query interface provides secondary indexing based on document types.

23. The computer readable medium of claim 20, wherein the unique identifier comprises a unique key value

24. The computer readable medium of claim 20, wherein the query interface determines if there is a digital copy of the results of the one or more query searches.

25. The computer readable medium of claim 20, wherein the query interface arranges to upload a digital copy if there is no digital copy of the results of the one or more query searches.

26. The computer readable medium of claim 20, wherein the remote computer comprises a query engine that executes the one or more query searches sent by the query interface.

27. The computer readable medium of claim 20, wherein the remote computer executes a plurality of threads once receiving the one or more query searches from the query interface.

28. The method of claim 23, wherein the unique key value is derived from hashing or concatenation of various data elements.

29. The method of claim 28, wherein the unique identifier value is used to aggregate documents for retrieval or reporting.

30. The system of claim 29, wherein the documents inherit all of the associated data elements stored in an external relational system having the unique identifier.