System, method and article of manufacture for a knowledgebase framework
A system, method and article of manufacture are provided for a knowledgebase framework. Information is obtained from at least one source utilizing a network. Utilizing a knowledge model, an index is generated for the obtained information. The generated index includes a plurality of items each associated with at least some of the obtained information. Utilizing the network, the generated index is displayed to a user. The user is permitted to select at least one of the items of the index. The information associated with the selected item is then displayed to the user utilizing the network.
Latest Patents:
The present invention relates to information management and retrieval and more particularly to frameworks for obtaining, managing, and providing information from a plurality of information sources.
BACKGROUND OF THE INVENTIONPeople who use computer systems and networks often need to look up information about the system they are using. Traditionally, information was stored in books and manuals, which were often kept physically near to the computer. If a user needed to look up information, he turned to a single source—the paper manuals stored conveniently nearby.
Currently, however, the amount of technical information available about a given computer system can be very large and can be stored at a wide variety of sources. Information is often provided to customers in “online” form, dispensing entirely with paper copies. This online information includes online databases, CD ROM databases, proprietary help systems, and online manuals. Large amounts of technical information are also available from third party online sources and from sources such as the World Wide Web.
Amid an apparent wealth of online information, people still have problems finding the information they need. Online information retrieval may have problems including those related to inappropriate user interface designs and to poor or inappropriate organization and structure of the information. Storage of information online in a variety of forms leads to certain information retrieval problems, several of which are described below.
The existence of a variety of information sources leads to the lack of a unified information space. An “information space” is the set of all sources of information that is available to a user at a given time or setting. When information is stored in many formats and at many sources, a user is forced to spend too much “overhead” on discovering and remembering where different information is located (e.g., online technical books, manual pages (“manpages”), release notes, help information, etc.). The user also spends a large amount of time remembering how to find information in each delivery mechanism. Thus, it is difficult for the user to remember where potentially relevant information might be, and the user is forced to jump between multiple different online tools to find it.
The existence of a variety of information sources leads to information strategies that lack cohesion. Users currently must learn to use and remember a variety of metaphors, user interfaces, and searching techniques for each delivery mechanism and class of information. No one type of interface suits all users. Furthermore, a user may need different types of searching techniques and interfaces, depending on the circumstances and the nature of the specific information needed.
The existence of a variety of information sources leads to lack of links between sources of information. Conventional delivery mechanisms often support only loosely structured navigation, such as keyword search or hyperlinks. Such mechanisms provide the user with only a local organization of information instead of providing a global picture of the information space.
The existence of a variety of information sources leads to frustration if the information uses a wide variety of terms or uses terms not familiar to the user. In addition, users employ concepts and terms differently than technical writers and authors. Conventional delivery mechanisms often rely on a keyword search as a primary means of finding information. If the user's vocabulary does not sufficiently overlap with indices employed by a delivery mechanism, a keyword search will result in a high percentage of disappointing and frustrating “term misses.” The only recovery method for a failed keyword search is simply to guess at better query.
The existence of a variety of information sources leads to titles and descriptions of the information that are not intuitive to a user. Users often conceptually group and describe problems differently than do information organizers and writers. If, for example, a user does not know the title of a book or the name of a database, he may not be able to find the information stored therein.
As computer systems become more complex and as sources of online information proliferate, it becomes more and more difficult for users to locate the information they need. Even worse, users may not always be aware of all the existing sources of information. Moreover, certain users may not use certain sources of information, even though they are aware of them, if they are not familiar with the interface or find it too difficult to use.
SUMMARY OF THE INVENTIONA system, method and article of manufacture are provided for a knowledgebase framework. Information is obtained from at least one source utilizing a network. Utilizing a knowledge model, an index is generated for the obtained information. The generated index includes a plurality of items each associated with at least some of the obtained information. Utilizing the network, the generated index is displayed to a user. The user is permitted to select an least one of the items of the index. The information associated with the selected item is then displayed to the user utilizing the network.
In an aspect of the present invention, one of the sources from which information is obtained may be an internal source. In another aspect of the present invention, one of the sources from which information is obtained may be an external source accessible utilizing a wide are network. In a further aspect of the present invention, the information obtained from the sources may include pharmaceutical information. In yet a further aspect of the present invention, displaying of the information associated with the selected item (or entry) to the user may also include utilizing the network to retrieve the associated information from the source from which the associated information was obtained. In even another aspect of the present invention, the network may be capable of communicating using TCP/IP protocol.
In an embodiment of the present invention, the network may be utilized to monitor one or more of the sources for updated information relating to one or more items in the index. In such an embodiment, when updated information is detected at one of the knowledge sources, a notice may be generated regarding the updated information. This notice may then be transmitted to the user utilizing the network to notify the user of the updated information. As an option, the user may be allowed to select the source(s) to be monitored for updates or other changes.
In another embodiment of the present invention, the user may be permitted to input a search term utilizing the network. The index may be searched for items associated with the search term. Items of the index associated with the search term may then be displayed to the user utilizing the network.
BRIEF DESCRIPTION OF THE DRAWINGSThe invention will be better understood when consideration is given to the following detailed description thereof. Such description makes reference to the annexed drawings wherein:
Embodiments of the present invention show how the concept of knowledge integration can be applied in the business world, especially in the pharmaceutical industry. Aspects of the present invention may be targeted for users active in the drug discovery process such as scientist and other researchers. Embodiments of the present invention may use knowledge integration technology to semantically integrate the knowledge capital located in various isolated repositories in the Internet. The information from this repositories are extracted and are classified based on various facets such as, for example drug, chemical compound, biological target, scientist, etc. As the results, embodiments of the present invention can graphically show users how the various facets of the information are related to each other.
An embodiment of a system in accordance with the present invention is preferably practiced in the context of a personal computer such as an IBM compatible personal computer, Apple Macintosh computer or UNIX based workstation. A representative hardware environment is depicted in
A preferred embodiment is written using JAVA, C, and the C++ language and utilizes object oriented programming methodology. Object oriented programming (OOP) has become increasingly used to develop complex applications. As OOP moves toward the mainstream of software design and development, various software solutions require adaptation to make use of the benefits of OOP. A need exists for these principles of OOP to be applied to a messaging interface of an electronic messaging system such that a set of OOP classes and objects for the messaging interface can be provided.
OOP is a process of developing computer software using objects, including the steps of analyzing the problem, designing the system, and constructing the program. An object is a software package that contains both data and a collection of related structures and procedures. Since it contains both data and a collection of structures and procedures, it can be visualized as a self-sufficient component that does not require other additional structures, procedures or data to perform its specific task. OOP, therefore, views a computer program as a collection of largely autonomous components, called objects, each of which is responsible for a specific task. This concept of packaging data, structures, and procedures together in one component or module is called encapsulation.
In general, OOP components are reusable software modules which present an interface that conforms to an object model and which are accessed at run-time through a component integration architecture. A component integration architecture is a set of architecture mechanisms which allow software modules in different process spaces to utilize each others capabilities or functions. This is generally done by assuming a common component object model on which to build the architecture. It is worthwhile to differentiate between an object and a class of objects at this point. An object is a single instance of the class of objects, which is often just called a class. A class of objects can be viewed as a blueprint, from which many objects can be formed.
OOP allows the programmer to create an object that is a part of another object. For example, the object representing a piston engine is said to have a composition-relationship with the object representing a piston. In reality, a piston engine comprises a piston, valves and many other components; the fact that a piston is an element of a piston engine can be logically and semantically represented in OOP by two objects.
OOP also allows creation of an object that “depends from” another object. If there are two objects, one representing a piston engine and the other representing a piston engine wherein the piston is made of ceramic, then the relationship between the two objects is not that of composition. A ceramic piston engine does not make up a piston engine. Rather it is merely one kind of piston engine that has one more limitation than the piston engine; its piston is made of ceramic. In this case, the object representing the ceramic piston engine is called a derived object, and it inherits all of the aspects of the object representing the piston engine and adds further limitation or detail to it. The object representing the ceramic piston engine “depends from” the object representing the piston engine. The relationship between these objects is called inheritance.
When the object or class representing the ceramic piston engine inherits all of the aspects of the objects representing the piston engine, it inherits the thermal characteristics of a standard piston defined in the piston engine class. However, the ceramic piston engine object overrides these ceramic specific thermal characteristics, which are typically different from those associated with a metal piston. It skips over the original and uses new functions related to ceramic pistons. Different kinds of piston engines have different characteristics, but may have the same underlying functions associated with it (e.g., how many pistons in the engine, ignition sequences, lubrication, etc.). To access each of these functions in any piston engine object, a programmer would call the same functions with the same names, but each type of piston engine may have different/overriding implementations of functions behind the same name. This ability to hide different implementations of a function behind the same name is called polymorphism and it greatly simplifies communication among objects.
With the concepts of composition-relationship, encapsulation, inheritance and polymorphism, an object can represent just about anything in the real world. In fact, one's logical perception of the reality is the only limit on determining the kinds of things that can become objects in object-oriented software. Some typical categories are as follows:
-
- Objects can represent physical objects, such as automobiles in a traffic-flow simulation, electrical components in a circuit-design program, countries in an economics model, or aircraft in an air-traffic-control system.
- Objects can represent elements of the computer-user environment such as windows, menus or graphics objects.
- An object can represent an inventory, such as a personnel file or a table of the latitudes and longitudes of cities.
- An object can represent user-defined data types such as time, angles, and complex numbers, or points on the plane.
With this enormous capability of an object to represent just about any logically separable matters, OOP allows the software developer to design and implement a computer program that is a model of some aspects of reality, whether that reality is a physical entity, a process, a system, or a composition of matter. Since the object can represent anything, the software developer can create an object which can be used as a component in a larger software project in the future.
If 90% of a new OOP software program consists of proven, existing components made from preexisting reusable objects, then only the remaining 10% of the new software project has to be written and tested from scratch. Since 90% already came from an inventory of extensively tested reusable objects, the potential domain from which an error could originate is 10% of the program. As a result, OOP enables software developers to build objects out of other, previously built objects.
This process closely resembles complex machinery being built out of assemblies and sub-assemblies. OOP technology, therefore, makes software engineering more like hardware engineering in that software is built from existing components, which are available to the developer as objects. All this adds up to an improved quality of the software as well as an increased speed of its development.
Programming languages are beginning to fully support the OOP principles, such as encapsulation, inheritance, polymorphism, and composition-relationship. With the advent of the C++ language, many commercial software developers have embraced OOP. C++ is an OOP language that offers a fast, machine-executable code. Furthermore, C++ is suitable for both commercial-application and systems-programming projects. For now, C++ appears to be the most popular choice among many OOP programmers, but there is a host of other OOP languages, such as Smalltalk, Common Lisp Object System (CLOS), and Eiffel. Additionally, OOP capabilities are being added to more traditional popular computer programming languages such as Pascal.
The benefits of object classes can be summarized, as follows:
-
- Objects and their corresponding classes break down complex programming problems into many smaller, simpler problems.
- Encapsulation enforces data abstraction through the organization of data into small, independent objects that can communicate with each other.
- Encapsulation protects the data in an object from accidental damage, but allows other objects to interact with that data by calling the object's member functions and structures.
- Subclassing and inheritance make it possible to extend and modify objects through deriving new kinds of objects from the standard classes available in the system. Thus, new capabilities are created without having to start from scratch.
- Polymorphism and multiple inheritance make it possible for different programmers to mix and match characteristics of many different classes and create specialized objects that can still work with related objects in predictable ways.
- Class hierarchies and containment hierarchies provide a flexible mechanism for modeling real-world objects and the relationships among them.
- Libraries of reusable classes are useful in many situations, but they also have some limitations. For example:
- Complexity. In a complex system, the class hierarchies for related classes can become extremely confusing, with many dozens or even hundreds of classes.
- Flow of control. A program written with the aid of class libraries is still responsible for the flow of control (i.e., it must control the interactions among all the objects created from a particular library). The programmer has to decide which functions to call at what times for which kinds of objects.
- Duplication of effort. Although class libraries allow programmers to use and reuse many small pieces of code, each programmer puts those pieces together in a different way. Two different programmers can use the same set of class libraries to write two programs that do exactly the same thing but whose internal structure (i.e., design) may be quite different, depending on hundreds of small decisions each programmer makes along the way. Inevitably, similar pieces of code end up doing similar things in slightly different ways and do not work as well together as they should.
Class libraries are very flexible. As programs grow more complex, more programmers are forced to reinvent basic solutions to basic problems over and over again. A relatively new extension of the class library concept is to have a framework of class libraries. This framework is more complex and consists of significant collections of collaborating classes that capture both the small scale patterns and major mechanisms that implement the common requirements and design in a specific application domain. They were first developed to free application programmers from the chores involved in displaying menus, windows, dialog boxes, and other standard user interface elements for personal computers.
Frameworks also represent a change in the way programmers think about the interaction between the code they write and code written by others. In the early days of procedural programming, the programmer called libraries provided by the operating system to perform certain tasks, but basically the program executed down the page from start to finish, and the programmer was solely responsible for the flow of control. This was appropriate for printing out paychecks, calculating a mathematical table, or solving other problems with a program that executed in just one way.
The development of graphical user interfaces began to turn this procedural programming arrangement inside out. These interfaces allow the user, rather than program logic, to drive the program and decide when certain actions should be performed. Today, most personal computer software accomplishes this by means of an event loop which monitors the mouse, keyboard, and other sources of external events and calls the appropriate parts of the programmer's code according to actions that the user performs. The programmer no longer determines the order in which events occur. Instead, a program is divided into separate pieces that are called at unpredictable times and in an unpredictable order. By relinquishing control in this way to users, the developer creates a program that is much easier to use. Nevertheless, individual pieces of the program written by the developer still call libraries provided by the operating system to accomplish certain tasks, and the programmer must still determine the flow of control within each piece after it's called by the event loop. Application code still “sits on top of” the system.
Even event loop programs require programmers to write a lot of code that should not need to be written separately for every application. The concept of an application framework carries the event loop concept further. Instead of dealing with all the nuts and bolts of constructing basic menus, windows, and dialog boxes and then making these things all work together, programmers using application frameworks start with working application code and basic user interface elements in place. Subsequently, they build from there by replacing some of the generic capabilities of the framework with the specific capabilities of the intended application.
Application frameworks reduce the total amount of code that a programmer has to write from scratch. However, because the framework is really a generic application that displays windows, supports copy and paste, and so on, the programmer can also relinquish control to a greater degree than event loop programs permit. The framework code takes care of almost all event handling and flow of control, and the programmer's code is called only when the framework needs it (e.g., to create or manipulate a proprietary data structure).
A programmer writing a framework program not only relinquishes control to the user (as is also true for event loop programs), but also relinquishes the detailed flow of control within the program to the framework. This approach allows the creation of more complex systems that work together in interesting ways, as opposed to isolated programs, having custom code, being created over and over again for similar problems.
Thus, as is explained above, a framework basically is a collection of cooperating classes that make up a reusable design solution for a given problem domain. It typically includes objects that provide default behavior (e.g., for menus and windows), and programmers use it by inheriting some of that default behavior and overriding other behavior so that the framework calls application code at the appropriate times.
There are three main differences between frameworks and class libraries:
-
- Behavior versus protocol. Class libraries are essentially collections of behaviors that you can call when you want those individual behaviors in your program. A framework, on the other hand, provides not only behavior but also the protocol or set of rules that govern the ways in which behaviors can be combined, including rules for what a programmer is supposed to provide versus what the framework provides.
- Call versus override. With a class library, the code the programmer instantiates objects and calls their member functions. It's possible to instantiate and call objects in the same way with a framework (i.e., to treat the framework as a class library), but to take full advantage of a framework's reusable design, a programmer typically writes code that overrides and is called by the framework. The framework manages the flow of control among its objects. Writing a program involves dividing responsibilities among the various pieces of software that are called by the framework rather than specifying how the different pieces should work together.
- Implementation versus design. With class libraries, programmers reuse only implementations, whereas with frameworks, they reuse design. A framework embodies the way a family of related programs or pieces of software work. It represents a generic design solution that can be adapted to a variety of specific problems in a given domain. For example, a single framework can embody the way a user interface works, even though two different user interfaces created with the same framework might solve quite different interface problems.
Thus, through the development of frameworks for solutions to various problems and programming tasks, significant reductions in the design and development effort for software can be achieved. A preferred embodiment of the invention utilizes HyperText Markup Language (HTML) to implement documents on the Internet together with a general-purpose secure communication protocol for a transport medium between the client and the Newco. HTTP or other protocols could be readily substituted for HTML without undue experimentation. Information on these products is available in T. Berners-Lee, D. Connoly, “RFC 1866: Hypertext Markup Language—2.0” (November 1995); and R. Fielding, H, Frystyk, T. Berners-Lee, J. Gettys and J. C. Mogul, “Hypertext Transfer Protocol—HTTP/1.1: HTTP Working Group Internet Draft” (May 2, 1996). HTML is a simple data format used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of domains. HTML has been in use by the World-Wide Web global information initiative since 1990. HTML is an application of ISO Standard 8879; 1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML).
To date, Web development tools have been limited in their ability to create dynamic Web applications which span from client to server and interoperate with existing computing resources. Until recently, HTML has been the dominant technology used in development of Web-based solutions. However, HTML has proven to be inadequate in the following areas:
-
- Poor performance;
- Restricted user interface capabilities;
- Can only produce static Web pages;
- Lack of interoperability with existing applications and data; and
- Inability to scale.
Sun Microsystem's Java language solves many of the client-side problems by:
-
- Improving performance on the client side;
- Enabling the creation of dynamic, real-time Web applications; and
- Providing the ability to create a wide variety of user interface components.
With Java, developers can create robust User Interface (UI) components. Custom “widgets” (e.g., real-time stock tickers, animated icons, etc.) can be created, and client-side performance is improved. Unlike HTML, Java supports the notion of client-side validation, offloading appropriate processing onto the client for improved performance. Dynamic, real-time Web pages can be created. Using the above-mentioned custom UI components, dynamic Web pages can also be created.
Sun's Java language has emerged as an industry-recognized language for “programming the Internet.” Sun defines Java as: “a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, dynamic, buzzword-compliant, general-purpose programming language. Java supports programming for the Internet in the form of platform-independent Java applets.” Java applets are small, specialized applications that comply with Sun's Java Application Programming Interface (API) allowing developers to add “interactive content” to Web documents (e.g., simple animations, page adornments, basic games, etc.). Applets execute within a Java-compatible browser (e.g., Netscape Navigator) by copying code from the server to client. From a language standpoint, Java's core feature set is based on C++. Sun's Java literature states that Java is basically, “C++ with extensions from Objective C for more dynamic method resolution.”
Another technology that provides similar function to JAVA is provided by Microsoft and ActiveX Technologies, to give developers and Web designers wherewithal to build dynamic content for the Internet and personal computers. ActiveX includes tools for developing animation, 3-D virtual reality, video and other multimedia content. The tools use Internet standards, work on multiple platforms, and are being supported by over 100 companies. The group's building blocks are called ActiveX Controls, small, fast components that enable developers to embed parts of software in hypertext markup language (HTML) pages. ActiveX Controls work with a variety of programming languages including Microsoft Visual C++, Borland Delphi, Microsoft Visual Basic programming system and, in the future, Microsoft's development tool for Java, code named “Jakarta.” ActiveX Technologies also includes ActiveX Server Framework, allowing developers to create server applications. One of ordinary skill in the art readily recognizes that ActiveX could be substituted for JAVA without undue experimentation to practice the invention.
To improve the decision making process, it may be helpful to deliver the right information to the right person at the right time. For example, the right information may include information from all parts of the organization and from external sources, information in the context of the business process (regardless of the source or format), and relevant information about business entities and relationships (rather than keywords and documents). Delivering the right information to the right person may involve filtering of the information based on needs of the individual, and delivery of the filtered information to the individual or team. The right time may mean providing up-to-date information and information on demand. Several challenges exist today that can make it difficult to meet these requirements. For example, both internal and external information may exist in different environments, platforms, formats such as proprietary databases, project reports and e-mail messages. Additionally, the underlying information repositories due to the heterogeneous nature, will need to remain unaltered because scientists and other business process participants store their information in diverse formats and the development of new applications using the repositories will continue, often in isolation. Further, traditional techniques of integration can be very time consuming to develop and often inflexible to rapid change. For instance, an average data-warehousing project typically takes between nine and twelve months to complete and most of these projects will typically only integrate structured information. Also, external information can be even a greater challenge: there are over one billion web pages (with this number doubling every four months) and not all sites are useful or trustworthy.
With embodiments of the present invention, the right information can be delivered to the right person at the right time. With embodiments of the present invention, the information can come from internal and external sources. The information can also be cleansed, integrated and placed in the right business context and also be customized to meet an individual's particular needs. Embodiments of the present invention also allow information to be delivered proactively (i.e., “pushed”).
One aspect of the present invention to help facilitate efficient collaboration by helping to allow the sharing of information with other team members and by providing a medium to communicate a set of well understood processes.
In closer detail, information may be contained in a plurality of internal sources 202 and external sources 204. An internal. source 202 of information is typically an information source that is an under the control of entity that employs the user and whose information may be proprietary to the entity. Internal sources of information may include, for example: discovery information, PD information, clinical information, regulatory information, and M&S information. An external source 204 of information is typically an information source that is not under the control of entity that employs the user. An external source may typically be accessible utilizing a wide are network such as the Internet and World Wide Web. External sources may include for example: bio-analysis information, study management information, safety data information, market report information, and Internet websites including government, public, and subscription based websites.
The knowledgebase framework may also include an index creator 206 which is connected to the internal and external sources 202, 204 by a network. The index creator 206 may also include or have access to a knowledgebase model 208. Utilizing the knowledgebase model 208, the index creator 206 may extract a wide variety information from the internal and external sources 202, 204, cleanse the extracted information, restructure the extracted information and then reconcile the extracted information into a knowledge model-based index.
The knowledgebase framework 200 may also include an index database 210 coupled to the index creator 206 for storing the knowledge model-based index created by the index creator 206. Users may then access the knowledge model-based index stored in the database 210 from a browser/portal 212 utilizing the network. As an option, the knowledgebase framework may also include a web server 214 or other similar type of computer for interfacing the browser/portal 212 with the database 210.
Additionally, the knowledgebase framework may include a decision support application 216 for helping a user determine what is the right information for the user and help the user receive the right information at the right time for the user. The decision support application 216 (in combination with the browser/portal 212) provides the user with the capability to browse and navigate through an integrated web of knowledge regardless the location of the knowledge sources. The decision support application 216 also allows the user to access internal and external information. The decision support application 216 may also be used to provide a user with information tailored for a specific process such as, for example, a drug discovery. The decision support application 216 may further be used to help deliver the right information to the user by allowing them to monitor internal and external events at a wide range of granularity.
The inter-relations between the various items of the knowledge model are illustrated in
The knowledge model also helps to provide an organizational structure to the index generated in the knowledgebase framework so that the items of the generated index are arranged according to the organization structure. In one embodiment of the present invention, the organizational structure of the generated index may be based on the inter-relations between the items of the knowledge model.
In an aspect of the present invention, one of the knowledge sources from which information is obtained may be an internal source under the control of entity that employs the user and whose information therein may be proprietary to the entity. Some illustrative examples of internal sources include: a genomics database, a pre-clinical database, a clinical database, and/or a departmental reports database.
In another aspect of the present invention, one of the knowledge sources from which information is obtained may be an external source (e.g., a website) accessible utilizing a wide are network such as the Internet and World Wide Web. In general, the external sources may not typically be under the control of entity that employs the user. Some illustrative examples of external sources include subscription based information, and/or market reports.
In a further aspect of the present invention, the information obtained from the sources may include pharmaceutical information such as, for example, information relating to: a pharmaceutical therapeutic area, a pharmaceutical target, a pharmaceutical compound, a disease, a patent, the Federal Drug Administration (FDA) (such as information regarding FDA approval of a pharmaceutical), a person researching or working on a pharmaceutical, and/or pharmaceutical literature such as a periodical.
In an embodiment of the present invention, the network may be utilized to monitor one or more of the knowledge sources for updated information relating to one or more items in the index. In such an embodiment, when updated information is detected at one of the knowledge sources, a notice may be generated regarding the updated information. This notice may then be transmitted to the user utilizing the network to notify the user of the updated information. As an option, the user may be allowed to select the knowledge source(s) to be monitored for updates or other changes.
In another embodiment of the present invention, the user may be permitted to input a search term for searching the index utilizing the network. Upon receipt of the search term, the index may be searched for items associated with the search term. Items of the index associated with the input search term (i.e., that match the search term) may then be displayed to the user utilizing the network.
In one aspect of the present invention, the items of the index may be organized and displayed in some sort of a hierarchical format such as, for example, a hierarchical tree format. In yet a further aspect of the present invention, displaying of the information associated with the selected item (or entry) to the user may also include utilizing the network to retrieve the associated information from the knowledgebase source (such as a website) from which the associated information was obtained. In even another aspect of the present invention, the network may be capable of communicating using TCP/IP protocol.
In one aspect of the present invention, the knowledge model may include a plurality of inter-associated or inter-related items. In such an aspect, generation of the index may include associating the extracted information with one or more of the items of the model, and then mapping the extracted information to the associated item. In this manner, when the index is displayed to a user, selection of the item by a user links the user to the associated information and the source of the information. As an option, the items of the knowledge model may include a therapeutic area item, a target item, disease item, a scientist item, an organization item, a patent item, a compound item, a literature item, a FDA approval item, and/or: a drug item.
In even another aspect of the present invention, the knowledge model may also provide an organizational structure to the generated index so that the items of the generated index are arranged according to the organization structure. As an illustrative example, the organizational structure may be a hierarchical tree of the items. In a further aspect of the present invention, the extracted information may include pharmaceutical information. In another aspect of the present invention, the knowledge sources may include one or more an internal knowledge sources, and/or an external knowledge sources. In yet still another aspect of the present invention, the network may be capable of communicating using TCP/IP protocol.
In an embodiment of the present invention, a user may be permitted to access the database utilizing the network to retrieve the stored index. In another embodiment of the present invention, a query may be received utilizing the network whereupon, the index may be searched for information matching the query to thereby permit retrieval of the matching information utilizing the network.
In one embodiment of the present invention, the knowledgebase framework 200 may be used to help a user learn about a field and/or catch up on new developments in this field. In an embodiment of the present invention, a user may be able to use the knowledgebase framework 200 to find people who are involved in the area being studied and their background, previous research work done in the area (which in an illustrative embodiment may include a list of targets, compounds and drugs), and obtain research reports relating to the area being studied. Also, the user may utilize the knowledgebase framework 200 to find information from external sources such as, for example: recent patents, targets, compounds, and drugs relating to the area being studied, as well the people (such as scientists) who are actively working in this field or area of study.
Upon logging in, the user has access to the knowledgebase framework utilizing the decision support application 216 to obtain information in the area of their study.
As illustrated in
In one embodiment of the present invention, one of the items may be selected (such as by clicking the right button of a mouse when the mouse pointer is over the item, i.e., “right clicking”) to display a pop-up menu 1110 which includes a monitor selection 1112 and a visit source selection 1114.
Utilizing the knowledgebase framework, a user may be able to. monitor work done by others, such as scientists researching a particular area or field. This may be accomplished by selecting the monitor selection 1110 of a selected item, such as for example a scientist item 1118 displayed the search and browse frame of
In one embodiment of the present invention, when the user logs into the decision support application 216, the user may see the most recent news about the scientists.
With continuing reference to
The monitored items portion 1604 may display a list of items 1616 selected by the user to be monitored by the knowledgebase framework. Like the recent news links 1606, the items 1616 in the monitored items portion 1604 may comprise links to access items in the knowledge model-based index.
With continuing reference to
In one aspect of the present invention, the target may be an item of index displayed to the user utilizing the network. In another aspect of the present invention, the target may be: a publication (e.g., literature), a person (e.g., scientist),a therapeutic area, a disease, a biological target, an organization, a compound, a patent, FDA approval, and/or a drug.
In a further aspect of the present invention, a pharmaceutical database may be monitored for changes or updates relating to the target. In yet another aspect of the present invention, the network may comprise an intranet of an organization and the Internet.
In an embodiment of the present invention, the received information may be stored in memory. In another embodiment of the present invention, the retrieved data may be transmitted to the user after receipt of an indication that the user has logged on to the network. As an option to such an embodiment, the retrieved data may be automatically transmitted to the user after receipt of the indication that the user has logged on to the network.
In a further embodiment of the present invention, the user may be alerted that a change or update to the target has been monitored utilizing the network. In even another embodiment of the present invention, the user may be permitted to input a search term utilizing the network. In such an embodiment, items associated with the search term may be searched for upon receipt of the search term. Then those items which have been found to be associated with the inputted search term may be displayed to the user utilizing the network.
The research frame 1802 may also include selectable links for accessing various tools for the research frame such as for example, templates 1806 and target tracking tools 1808.
The following example describes an illustrative scenario for utilizing the knowledgebase framework in accordance with an embodiment of the present invention.
EXAMPLEAnne Kline, a senior biologist at Acme Pharmaceutical, has just transferred from the Oncology department to the Cardiovascular department. She has a reasonably strong background in Cardiovascular. Prior to joining Acme Pharmaceutical, she worked at the Imperial College School of Medicine's Cardiovascular department for a couple years. However, she has not been active in this area since she joined Acme Pharmaceutical 3 years ago. She needs to catch up with the new developments in this area—inside and outside Acme Pharmaceutical. Acme Pharmaceutical has just installed a knowledgebase framework. The knowledgebase framework allows Acme Pharmaceutical's scientists to search, browse and monitor internal and external information available to them. Anne accesses the knowledgebase framework from her computer desktop.
Anne accesses the knowledgebase framework from her computer desktop. She spends almost the entire day using the knowledgebase framework and at the end of the day she is able to find:
-
- The people in Acme Pharmaceutical who are involved in the cardiovascular area and their background
- Previous research work done within Acme Pharmaceutical (which includes a list of targets, compounds and drugs)
- Internal research reports
In addition, Anne also finds useful information from external sources such as recent:
-
- Patents
- Targets
- Compounds
- Drugs
- as well the scientists who are actively working in this area
In addition, Anne finds two scientists whose work seem to be relevant to her first assignment. She sets up her profile in the knowledgebase framework in such a way that it will monitor any future work done by these scientists. The next time Anne accesses the knowledgebase framework, she will see the most recent news about those two scientists. She also knows that Merck has been very active in the Cardiovascular area. She sets up the knowledgebase framework to monitor any new publications, patents, drug applications by Merck The next time Anne accesses the knowledgebase framework, she will see the most recent news about Merck.
Anne's first assignment is to investigate TR27 K-Channel as a potential target for hypertension treatment. She uses the knowledgebase framework to find out any previous work related TR27. She finds only one article that are somewhat relevant. Since she will be working on this target for awhile, she sets up the knowledgebase framework to monitor any new information related to TR27. One morning a couple days later, Anne turns on her computer and the knowledgebase framework informs her that Pfizer has filed a patent and this patent has cited TR27. Anne quickly browses through the patent. Luckily, the patent cited TR27 for a different reason.
Later on that day, the knowledgebase framework informs her that there is a newly released internal report that mentioned this particular target. This report was filed by the Neurology department, right after the High Throughput Screening was conducted on the target. She downloads the report and studies it carefully.
She launches Target DB, a tool that stores information of all targets investigated by Acme Pharmaceutical, from the knowledgebase framework to find out the details information about assay used for TR27. With help from the knowledgebase framework, Anne figures out the person involved with this target. She is able to contact one of that researcher for further information.
While her testing procedures will be different, Anne is able to use many parts of the results as a starting point. This encounter has saved her a few months of hard work. The two researchers are able to share a set of common processes and report templates to document their findings for further collaboration.
In accordance with an embodiment of the present invention, a BackgroundFinder (BF) is implemented as an agent responsible for preparing an individual for an upcoming meeting by helping him/her retrieve relevant information about the meeting from various sources. BF receives input text in character form indicative of the target meeting. The input text is generated in accordance with an embodiment of the present invention by a calendar program that includes the time of the meeting. As the time of the meeting approaches, the calendar program is queried to obtain the text of the target event and that information is utilized as input to the agent. Then, the agent parses the input meeting text to extract its various components such as title, body, participants, location, time etc. The system also performs pattern matching to identify particular meeting fields in a meeting text. This information is utilized to query various sources of information on the web and obtain relevant stories about the current meeting to send back to the calendaring system. For example, if an individual has a meeting with Netscape and Microsoft to talk about their disputes, and would obtain .this initial information from the calendaring system. It will then parse out the text to realize that the companies in the meeting are “Netscape” and “Microsoft” and the topic is “disputes.” Then, the system queries the web for relevant information concerning the topic. Thus, in accordance with an objective of the invention, the system updates the calendaring system and eventually the user with the best information it can gather to prepare the user for the target meeting. In accordance with an embodiment of the present invention, the information is stored in a file that is obtained via selection from a link imbedded in the calendar system.
Program Organization:
A computer program in accordance with an embodiment of the present invention is organized in five distinct modules: BF.Main, BF.Parse, Background Finder.Error, BF.PatternMatching and BF.Search. There is also a frnMain which provides a user interface used only for debugging purposes. The executable programs in accordance with an embodiment of the present invention never execute with the user interface and should only return to the calendaring system through Microsoft's Winsock control. An embodiment of the system executes in two different modes which can be specified under the command line sent to it by the calendaring system. When the system runs in simple mode, it executes a keyword query to submit to external search engines. When executed in complex mode, the system performs pattern matching before it forms a query to be sent; to a search engine.
Data Structures:
The system in accordance with an embodiment of the present invention utilizes three user defined structures:
- TMeetingRecord;
- TPatternElement; and
- TPatternRecord.
The user-defined structure, tMeetingRecord, is used to store all the pertinent information concerning a single meeting. This info includes userID, an original description of the meeting, the extracted list of keywords from the title and body of meeting etc. It is important to note that only one meeting record is created per instance of the system in accordance with an embodiment of the present invention. This is because each time the system is spawned to service an upcoming meeting, it is assigned a task to retrieve information for only one meeting. Therefore, the meeting record created corresponds to the current meeting examined.
ParseMeetingText populates this meeting record and it is then passed around to provide information about the meeting to other functions.
If GoPatternMatch can bind any values to a particular meeting field, the corresponding entries in the meeting record is also updated. The structure of tMeetingRecord with each field described in parentheses is provided below in accordance with an embodiment of the present invention.
Public Type tMeetingRecord
-
- sUserID As String (user id given by Munin)
- sTitleOrig As String (original non stop listed title we need to keep around to send back to Munin)
- sTitleKW As String (stoplisted title with only keywords)
- sBodyKW As String (stoplisted body with only keywords) p1 sCompany( ) As String (companies identified in title or body through pattern matching)
- sTopic( ) As String (topics identified in title or body through pattern matching)
- sPeople( ) As String people identified in title or body through pattern matching)
- sWhen( ) As String (time identified in title or body through pattern matching)
- sWhere( ) As String (location identified in title or body through pattern matching)
- sLocation As String (location as passed in by Munin)
- sTime As String (time as passed in by Munin)
- sParticipants( ) As String (all participants engaged as passed in by Munin)
- sMeetingText As String (the original meeting text w/o userid)
End Type
There are two other structures which are created to hold each individual pattern utilized in pattern matching. The record tAPatternRecord is an array containing all the components/elements of a pattern. The type tAPatternElement is an array of strings which represent an element in a pattern. Because there may be many “substitutes” for each element, we need an array of strings to keep track of what all the substitutes are. The structures of tAPatternElement and tAPatternRecord are presented below in accordance with an embodiment of the present invention.
- Public Type tAPatternElement
- elementArray( ) As String
- End Type
- Public Type tAPatternRecord
- patternArray( ) As tAPatternElement
- End Type
User Defined Constants:
Many constants are defined in each declaration section of the program which may need to be updated periodically as part of the process of maintaining the system in accordance with an embodiment of the present invention. The constants are accessible to allow dynamic configuration of the system to occur as updates for maintaining the code.
Included in the following tables are lists of constants from each module which I thought are most likely to be modified from time to time. However, there are also other constants used in the code not included in the following list. It does not mean that these non-included constants will never be changed. It means that they will change much less frequently.
For the Main Module (BF.Main):
For the Search Module (BF.Search):
For the Parse Module (BF.Parse):
For Pattern Matching Module (BFPatternMatch): There are no constants in this module which require frequent updates.
General Process Flow:
The best way to depict the process flow and the coordination of functions between each other is with the five flowcharts illustrated in FIGS. 20 to 24.
One key thing to notice is that functions depicted at the same level of the chart are called by in sequential order from left to right (or top to bottom) by their common parent function. For example, Main 2000 calls ProcessCommandLine 2010, then CreateStopListist 2020, then CreatePatterns 2030, then GoBackgroundFinder 2040. FIGS. 21 to 24 detail the logic for the entire program, the parsing unit, the pattern matching unit and the search unit respectively.
Detailed Search Architecture Under the Basic Search/Simple Query Mode
Search ALTA VISTA (Function Block 2070 of
The Alta Vista search engine utilizes the identifies and returns general information about topics related to the current meeting as shown in function block 270 of
NewsPage (Function Block 2075 of
The NewsPage search system is responsible for giving us the latest news topics related to a target meeting. The system takes all of the keywords from the title portion of the original meeting text and constructs a query to send to the NewsPage search engine. The keywords are logically combined together in the query. Only articles published recently are retrieved. The NewsPage search system provides a date restriction criteria that is settable by a user according to the user's preference. The top ranking stories are returned to the calendaring system.
Pattern Matching:
Limitations associated with a simple searching method include:
-
- 1. Because it relies on a stop list of unwanted words in order to extract from the meeting text a set of keywords, it is limited by how comprehensive the stop list is. Instead of trying to figure out what parts of the meeting text we should throw away, we should focus on what parts of the meeting text we want.
- 2. A simple search method in accordance with an embodiment of the present invention only uses the keywords from a meeting title to form queries to send to Alta Vista and NewsPage. This ignores an alternative source of information for the query, the body of the meeting notice. We cannot include the keywords from the meeting body to form our queries because this often results in queries which are too long and so complex that we often obtain no meaningful results.
- 3. There is no way for us to tell what each keyword represents. For example, we may extract “Andy” and “Grove” as two keywords. However, a simplistic search has no way knowing that “Andy Grove” is in fact a person's name. Imagine the possibilities if we could somehow intelligently guess that “Andy Grove” is a person's name. We can find out if he is an Andersen person and if so what kind of projects he's been on before etc. etc.
- 4. In summary, by relying solely on a stop list to parse out unnecessary words, we suffer from “information overload”.
Pattern Matching Overcomes These Limitations:
Here's how the pattern matching system can address each of the corresponding issues above in accordance with an embodiment of the present invention.
-
- 1. By doing pattern matching, we match up only parts of the meeting text that we want and extract those parts.
- 2. By performing pattern matching on the meeting body and extracting only the parts from the meeting body that we want. Our meeting body will not go to complete waste then.
- 3. Pattern matching is based on a set of templates that we specify, allowing us to identify people names, company names etc from a meeting text.
- 4. In summary, with pattern matching, we no longer suffer from information overload. Of course, the big problem is how well our pattern matching works. If we rely exclusively on artificial intelligence processing, we do not have a 100% hit rate. We are able to identify about 20% of all company names presented to us.
Patterns:
A pattern in the context of an embodiment of the present invention is a template specifying the structure of a phrase we are looking for in a meeting text. The patterns supported by an embodiment of the present invention are selected because they are templates of phrases which have a high probability of appearing in someone's meeting text. For example, when entering a meeting in a calendar, many would write something such as “Meet with Bob Dutton from Stanford University next Tuesday.” A common pattern would then be something like the word “with” followed by a person's name (in this example it is Bob Dutton) followed by the word “from” and ending with an organization's name (in this case, it is Stanford University).
Pattern Matching Terminology:
Terminology associated with pattern matching includes:
-
- Pattern: a pattern is a template specifying the structure of a phrase we want to bind the meeting text to. It contains sub units.
- Element: a pattern can contain many sub-units. These subunits are called elements. For example, in the pattern “with $PEOPLE$ from $COMPANY$”, “with” “$PEOPLE$” “from” “$COMPANY$” are all elements.
- Placeholder: a placeholder is a special kind of element in which we want to bind a value to. Using the above example, “$PEOPLE$” is a placeholder.
- Indicator: an indicator is another kind of element which we want to find in a meeting text but no value needs to bind to it. There may be often more than one indicator we are looking for in a certain pattern. That is why an indicator is not an “atomic” type.
- Substitute: substitutes are a set of indicators which are all synonyms of each other. Finding any one of them in the input is good.
There may be five fields which are identified for each meeting:
-
- Company ($COMPANY$)
- People ($PEOPLE$)
- Location ($LOCATION$)
- Time ($TIME$)
- Topic ($TOPIC_UPPER$) or ($TOPIC_ALL$)
In parentheses are the illustrative placeholders used in the code as representation of the corresponding meeting fields.
Each placeholder may have the following meaning:
-
- $COMPANY$: binds a string of capitalized words (e.g., Meet with Joe Carter of <Andersen Consulting>)
- $PEOPLE$: binds series of string of two capitalized words potentially connected by “,” “and” or “&” (e.g., Meet with <Joe Carter> of Andersen Consulting, Meet with <Joe Carter and Luke Hughes> of Andersen Consulting)
- $LOCATION$: binds a string of capitalized words (e.g., Meet Susan at <Palo Alto Square>)
- $TIME$: binds a string containing the format #:## (e.g., Dinner at <6:30 pm>)
- $TOPIC_UPPER$: binds a string of capitalized words for our topic (e.g., <Stanford Engineering Recruiting> Meeting to talk about new hires).
- $TOPIC_ALL$: binds a string of words without really caring if it's capitalized or not. (e.g., Meet to talk about <ubiquitous computing>)
The following table represents patterns supported by BF. Each pattern belongs to a pattern group. All patterns within a pattern group share a similar format and they only differ from each other in terms of what indicators are used as substitutes. Note that the patterns which are grayed out are also commented in the code. BF has the capability to support these patterns but we decided that matching these patterns is not essential at this point.
Using the Identified Meeting Fields:
Now that we have identified fields within the meeting text which we consider important, there are quite a few things we can do with it. One of the most important applications of pattern matching is of course to improve the query we construct which eventually gets submitted to Alta Vista and News Page. There are also a lot of other options and enhancements which exploit the results of pattern matching that we can add to BF. These other options will be described in the next section. The goal of this section is to give the reader a good sense of how the results obtained from pattern matching can be used to help us obtain better search results.
Alta Vista Search Engine:
A strength of the Alta Vista search engine is that it provides enhanced flexibility. Using its advance query method, one can construct all sorts of Boolean queries and rank the search however you want. However, one of the biggest drawbacks with Alta Vista is that it is not very good at handling a large query and is likely to give back irrelevant results. If we can identify the topic and the company within a meeting text, we can form a pretty short but comprehensive query which will hopefully yield better results. We also want to focus on the topics found. It may not be of much merit to the user to find out info about a company especially if the user already knows the company well and has had numerous meetings with them. It's the topics they want to research on.
News Page Search Engine:
A strength of the News Page search engine is that it does a great job searching for the most recent news if you are able to give it a valid company name. Therefore when we submit a query to the news page web site, we send whatever company name we can identify and only if we cannot find one do we use the topics found to form a query. If neither one is found, then no search is performed. The algorithm utilized to form the query to submit to Alta Vista is illustrated in
The following table describes in detail each function in accordance with an embodiment of the present invention. The order in which functions appear mimics the process flow as closely as possible. When there are situations in which a function is called several times, this function will be listed after the first function which calls it and its description is not duplicated after every subsequent function which calls it.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Claims
1. A method for a knowledgebase framework, comprising the steps of:
- (a) obtaining information from at least one source utilizing a network;
- (b) generating a knowledge model-based index for the obtained information using a knowledge model, wherein the generated knowledge model-based index comprises a plurality of items each associated with at least some of the obtained information;
- (c) displaying the knowledge model-based index to a user utilizing the network;
- (d) permitting the user to select at least one of the items of the knowledge model-based index; and
- (e) displaying the information associated with the selected item to the user utilizing the network.
2. A method as recited in claim 1, wherein the at least one source comprises an internal source.
3. A method as recited in claim 1, wherein the at least one source comprises an external source accessible utilizing a wide are network.
4. A method as recited in claim 1, wherein the information obtained from the sources includes pharmaceutical information.
5. A method as recited in claim 1, further comprising the steps of monitoring at least one of the sources utilizing the network for updated information relating to at least one of the items of the knowledge model-based index, generating a notice regarding the updated information, and transmitting the notice to the user utilizing the network.
6. A method as recited in claim 5, wherein the user selects the at least one source to be monitored.
7. A method as recited in claim 1, further comprising the steps of permitting the user to input a search term utilizing the network, searching the knowledge model-based index for items associated with the search term, and displaying items of the knowledge model-based index associated with the search term to the user utilizing the network.
8. A method as recited in claim 1, wherein displaying the information associated with the selected item to the user includes utilizing the network to retrieve the associated information from the source from which the associated information was obtained.
9. A method as recited in claim 1, wherein the network is capable of communicating using TCP/IP protocol.
10. A computer program embodied on a computer readable medium for a knowledgebase framework, comprising:
- (a) a code segment that obtains information from at least one source utilizing a network;
- (b) a code segment that generates a knowledge model-based index for the obtained information using a knowledge model; wherein the generated knowledge model-based index comprises a plurality of items each associated with at least some of the obtained information;
- (c) a code segment that displays the knowledge model-based index to a user utilizing the network;
- (d) a code segment that permits the user to select at least one of the items of the knowledge model-based index; and
- (e) a code segment that displays the information associated with the selected item to the user utilizing the network.
11. A computer program as recited in claim 10, wherein the at least one source comprises an internal source.
12. A computer program as recited in claim 10, wherein the at least one source comprises an external source accessible utilizing a wide are network.
13. A computer program as recited in claim 10, wherein the information obtained from the sources includes pharmaceutical information.
14. A computer program as recited in claim 10, further comprising a code segment that monitors at least one of the sources utilizing the network for updated information relating to at least one of the items of the knowledge model-based index, a code segment that generates a notice regarding the updated information, and a code segment that transmits the notice to the user utilizing the network.
15. A computer program as recited in claim 14, wherein the user selects the at least one source to be monitored.
16. A computer program as recited in claim 10, further comprising a code segment that permits the user to input a search term utilizing the network, a code segment that searches the knowledge model-based index for items associated with the search term, and a code segment that displays items of the knowledge model-based index associated with the search term to the user utilizing the network.
17. A computer program as recited in claim 10, wherein displaying the information associated with the selected item to the user includes utilizing the network to retrieve the associated information from the source from which the associated information was obtained.
18. A computer program as recited in claim 10, wherein the network is capable of communicating using TCP/IP protocol.
19. A system for a knowledgebase framework, comprising:
- (a) logic that obtains information from at least one source utilizing a network;
- (b) logic that generates a knowledge model-based index for the obtained information using a knowledge model, wherein the generated knowledge model-based index comprises a plurality of items each associated with at least some of the obtained information;
- (c) logic that displays the knowledge model-based index to a user utilizing the network;
- (c) logic that permits the user to select at least one of the items of the knowledge model-based index; and
- (d) logic that displays the information associated with the selected item to the user utilizing the network.
20. A system as recited in claim 19, further comprising logic that monitors at least one of the sources utilizing the network for updated information relating to at least one of the items of the knowledge model-based index, logic that generates a notice regarding the updated information, and logic that transmits the notice to the user utilizing the network.
Type: Application
Filed: Feb 6, 2004
Publication Date: Jul 21, 2005
Applicant:
Inventors: Harlan Hugh (Los Angeles, CA), Edy Liongosari (Wheeling, IL)
Application Number: 10/774,042