System, method and program product for management of life sciences data and related research
System, method and program product for managing data for researchers. A research data server receives and manages experimental data and research data and results from the researchers, and operates with a virtual storage device to maintain the experimental data and research data and results. A reference data access server receives and manages external reference data relating to the research and operating with the virtual storage device to maintain the external reference data. Computational resources allow researchers to capture, process and analyze experimental data to obtain results. A research data network connects the virtual storage device, research data server, reference data access server and the computational resources to allow transfer of data there between. Security management services authenticate and authorize access by the researchers to the system.
Latest IBM Patents:
The present invention relates generally to computer management of data and related research results. More specifically, the present invention relates to computer management of data and related research in life science fields.
Modern life sciences research, such as pharmaceutical research, typically requires applied, iterative, parallel research across many technical disciplines. Modern pharmaceutical research typically involves researchers from biology, genetics, chemistry, clinical and pathology disciplines. The research process is typically iterative, with the results from one discipline being supplied to another discipline, etc. with each discipline analyzing processing the supplied and other data. Heretofore, there have been inadequate computer systems and methods for collaboration between researchers in the different disciplines, and management of the overall process. These problems are exacerbated when large amounts of data are generated and must be transformed, translated, reorganized, analyzed or otherwise processed as the data moves between disciplines and/or research teams.
An object of the present invention is to provide an improved, comprehensive system, method and program product for collaboration among researchers and management of data and related research results.
Another object of the present invention is to provide a system, method and program product of the foregoing type which is suited for development of pharmaceuticals and other medical therapies.
SUMMARY OF THE INVENTIONThe invention resides in a system, method and program product for managing data for researchers. Research data is automatically received from laboratory instruments. Established reference data is accessed from a database. Recently available reference data is automatically obtained. Experimental data is accessed from a database. There are a plurality of applications to process respective types of the data. In response to a request by a researcher to perform a data processing function, one or more of the processing applications are invoked and supplied with parameters to perform the data processing function. The one or more applications automatically access types of the data required to perform the respective data processing function.
According to features of the present invention, the determination of which of the processing applications to invoke and which parameters to supply to these processing applications can be based on a type of function requested by the researcher. The determination of the identities of files containing the data required by the one or more applications can be based on a type of function requested by the researcher. There can also be an application to analyze patterns in respective types of the data, and one of the processing applications receives from the pattern analyzing application a pattern used to perform the requested data processing function. There can also be a program for determining if available data is valid. If not, the available data is not used for the one or more processing applications. If so, the available data is used for the one or more data processing applications. There can also be a program for formatting results of the one or more data processing applications to correspond to respective types of data processing requests.
According to another embodiment of the present invention, there is provided another system for managing data for researchers. This other system comprises a virtual storage device including online and near line storage and having policies predefined for moving stored data between the online and near line storage. The system also comprises a research data server for receiving and managing experimental data and research data and results from the researchers and operating with the virtual storage device to maintain the experimental data and research data and results. The system also comprises a reference data access server receiving and managing external reference data relating to the research and operating with the virtual storage device to maintain the external reference data. The system also comprises computational resources for the researchers to capture, process and analyze experimental data to obtain results. The system also comprises a research data network connecting the virtual storage device, research data server, reference data access server and the computational resources to allow transfer of data there between. The research data network further includes security management services to authenticate and authorize access by the researchers to the system.
This other system may also include a data import controller connected to one or more public data networks (e.g., the Internet) as well as to the research data network. The data import controller is operable to retrieve external reference data from data sources external to the research data network according to one or more policies predefined by the researchers for retrieving external reference data. Also, the computational resources may include a high performance computing server comprising a cluster of homogeneous or hybrid computing resources. Also, the system may include a laboratory information management system connected to the research data network and to one or more laboratory instruments. The laboratory information management system receives experimental data from the laboratory instruments and provides that data to the research data server via the research data network.
According to another aspect of this other embodiment of the present invention, there is provided a method of managing data for research. A set of policies defining external reference information relevant to the research program is created. At predefined intervals, external reference information in accordance with the policies is retrieved. The retrieved information is compared with reference data stored in a reference data server to determine if the retrieved information is redundant or of lower quality than data already stored in the reference data server. The retrieved information which was determined to be non-redundant and/or of acceptable better quality is stored in the reference data server. The experimental data from optionally one or more laboratory instruments is stored in a research data server. The researchers are provided with access to the stored information in the reference data server and to experimental data in the research data server.
According to yet another aspect of the present invention, there is provided a method of managing data for research. Researchers are provided with access to a research data network. Reference data policies define for each researcher types of reference data that will be of use to the researcher. Experimental data policies define for each researcher types of experimental data and results that will be of use to the researcher are created and stored on the research data network. At defined intervals, from data sources outside the research data network, external reference data as defined by the reference policies is retrieved and examined to determine if it is redundant in view of reference data already stored on the research data network or if it is of better quality than reference data already stored on the network. The retrieved reference data which has been determined to be non-redundant or of better quality than reference data already stored is stored on the research data network. Experimental data is collected from laboratory instruments through the research data network and stored on the research data network. New reference data and experimental data are published to researchers according to the reference data policies and experimental data policies defined for the researchers.
According to yet another aspect of the present invention, there is provided a computer program product stored on a computer readable medium to manage data for research. First program instructions provide the researchers with access to a research data network. Second program instructions retrieve, at defined intervals, from data sources outside the research data network, external reference data as defined by reference policies created on the computer by researchers. Third program instructions examine the retrieved reference data to determine if it is redundant in view of reference data already stored on the research data network or if it is of better quality than reference data already stored on the network, and store retrieved reference data on the research data network which has been determined to be non-redundant or of better quality than reference data already stored. Fourth program instructions store experimental data from optionally one or more laboratory instruments in a research data server. Fifth program instructions provide the researchers with access to the stored information in the reference data server and to experimental data in the research data server.
BRIEF DESCRIPTION OF THE DRAWINGS
While the following embodiment illustrates use of the present invention for pharmaceutical research, the present invention has other embodiments and uses as well. For example, the present invention can be employed in research for pharmaceuticals, treatments, diagnostics, non-drug treatment protocols and preventatives, and other sciences.
Modern life sciences research, such as high level drug discovery and development, comprises a series of steps for acquiring and analyzing chemical and biological data, wherein processing is performed at each step. For example, in the high-level drug research areas, the key activities typically include (a) the collection of “assay” data generated by laboratory instruments, (b) searching for and obtaining external reference and research materials, (c) analyzing the consolidated assay data and the external reference materials, and (d) deriving knowledge through the analysis. These tasks are typically repeated, in a cyclical manner, by each discipline within the research team. The present invention assists in these key activities, providing automation, data management and multidisciplinary collaboration facilities between researchers in different disciplines.
In
External reference data 28 can comprise well known or established gene sequence and protein databases, either public or private, clinical data, etc. External published data 32 can include newly discovered genes or proteins, novel drug targets such as novel chemical entities or novel molecular entities, new insights in disease mechanisms, etc. The external published data 32 may reside in a known database, such as the PubMed database which is maintained by the National Center for Biotechnology Information. The relevant published data an be periodically identified and retrieved automatically by the data retrieval engine 56 by key word searching or author searching, based on predefined key words and authors.
Internal researchers 36 are primary drivers of the current research effort. While
Laboratory instruments 40 can include any instrument useful to the research effort. In the life sciences field, such instruments can include gene sequencers, mass-spectrometers, crystallographic imaging devices, tomographic equipment, etc. Many such instruments are now robotic in nature and can directly interface with a laboratory information management system (LIMS) 44. An example of such an instrument is an ABI DNA sequencing system which directly interfaces to LIMS 44 and to one or more personal computers in the laboratory which provide for control and/or calibration of the device. Other instruments require manual operation and/or examination of their results by a technician or researcher, but these results are still provided to LIMS 44 for the assays. LIMS 44 are well known in the life sciences field and can be custom designed for a laboratory or can be purchased commercially, as desired.
As shown in
The data stored and used by system 20 is classified as follows for subsequent processing by research applications, as explained below. The external reference data 28 is classified by type based on the data file in which the external reference data resides. When a unit of external reference data 28 is identified as a candidate to be included in a data file, a data manager (person) determines the type of the candidate external reference data and stores it in the data file that is earmarked for this type of data. (Alternately, a program tool can search each unit of external reference by key word, and classify the external reference data based on the results of the key word search.) The web retrieval server 56 classifies each item of external published data 32 by type based on key words found in the publication. The type of data obtained from the LIMS is based on the type of instruments that are generating the data. Each of the classification types is one of a multiplicity of predetermined types. These types were predetermined by one or more scientists with expertise in such classification and research applications that will need the data. As explained in more detail below, each research application will need and fetch certain types of data to be processed on behalf of a researcher. Each data item can also be accompanied by a header file which indicates whether the data needs to be preprocessed (such as by preprocessing server 92 shown in
System 20 also includes a virtual storage device 52. As mentioned above, life sciences research can produce large quantities (petabytes or exabytes) of data. System 20 employs virtual storage device 52 to facilitate the handling of this data. Virtual storage device 52 comprises a collection of online storage devices, such as disk drives, solid state drives, etc. and a variety of near line storage devices, such as robotic tape libraries, etc. which can retrieve requested data within about one minute. Virtual storage device 52 employs a set of policies to manage the storage of research data sets between the online and near line storage subsystems. Such policies can employ strategies such as automatic migration of aged data from online to near line storage, heuristic migration based upon determined usage patterns for the data, etc. Because storage device 52 is virtual, it can be scaled as necessary by adding more storage devices. Also, it is transparent to a user whether desired data is stored in online or near line storage, although in the case of data stored on near line storage, the user may experience a slight delay in access. By way of example, virtual storage device 52 for current research efforts has at least several terabytes of online storage and several petabytes of near line storage. An example of a suitable virtual storage device 52 would be one or more IBM Enterprise Storage Servers and one or more IBM LTO UltraScalable Tape Libraries. In system 20, virtual storage device 52 stores all research data relating to a research effort, although local copies of smaller data sets can be maintained by researchers. By employing virtual storage device 52 across system 20, quality, integrity, security, privacy and availability of research data is assured.
Data retrieval engine 56 operates with a web retrieval engine 60 to retrieve (based on predefined key word search and predefined author search policies) and process desired external reference data over public networks such as the Internet 68. Specifically, a data retrieval engine 56 in the form of a data import controller uses these policies established by the research team to have web retrieval engine 60 retrieve external data. Web retrieval engine 60 processes the policy-driven requests from data import controller 56 to automatically retrieve predefined external reference data through the Internet 68, or other networks, via appropriate protocol and/or data format converters. Policies for web retrieval engine 60 can include regularly scheduled searches of specific databases, identification and retrieval of updated versions of previously retrieved data, searches for new data sources, etc. Examples of suitable web retrieval engines are the IBM WebSphere software platform or Apache Software Foundation web server integrated with the IBM WebSphere platform. Web retrieval engine 60 can use any appropriate computer program to retrieve the desired external references, such as ftp for document transfers, SQL queries for database searches, etc. In one embodiment of the invention, web retrieval engine 60 includes local storage where retrieved information is temporarily stored, for subsequent processing by data import controller 56.
B2B engine 64 includes a web server and operates to make data from system 20 available to external researchers 24. Examples of suitable B2B web retrieval engines 64 also are the IBM WebSphere software platform or Apache Software Foundation server integrated with IBM WebSphere platform. As discussed below, system 20 includes security management services which operate to limit the data that can be accessed by an external user 24.
As illustrated in
System 20 also includes a preprocessing server 92 which comprises one or more computer systems. Preprocessing server 92 operates on the raw data provided by LIMS 44 to convert that data into data which is chemically, biologically, clinically etc. useful and relevant for the purpose and context of the research. Depending upon the nature of the assay and the devices performing the assay, this preprocessing can include data filtering, data normalization, data validation, etc. For example, preprocessing server 92 can filter data by removing a data set with key missing values, or data with noisy or statistically improbable values, such as long nucleotide segments in which all bases are identical. For example, preprocessing server 92 can normalize data by establishing a common scale or set of units for comparing disparate data, such as by multiplying the data by a constant to make the maximum value in each set precisely 1.0. For example, preprocessing server 92 can invalidate data which falls outside of certain expected data ranges, or is inherently invalid, such as ovarian cancer found in a man. Preprocessing server 92 can also assign a higher reliability to data which has previously been reviewed and annotated by a researcher.
System 20 includes an application server system 96 which includes a high performance computing server (HPCS). In a presently preferred embodiment, the HPCS comprises a high performance computing cluster, such as a Linux cluster of high speed processors, as this allows a large amount of available computing resources to be scaled appropriately, as needed, by adding or removing computing processors to and from the cluster. Gene multiple sequence alignment and/or protein folding are just a few examples of research activities which can require large amounts of computing resources. The HPCS is capable of serving as a back end processing resources for several research applications. As explained in more detail below, application server system 96 also includes a data mining application 292 and research applications 282, 284 and 286.
System 20 also includes a post processor 100 to operate on results produced in application server system 96, or elsewhere, to convert resultant data into chemically, biologically, etc. useful forms which are relevant for the purpose and context of the research. This post processing can comprise data clustering, annotation, classification, presentation, etc. and can be performed by various applications.
System 20 also includes a knowledge management server (KMS) 104. KMS 104 provides the researchers with access to relevant biological, chemical and/or clinical information. The functions provided by KMS 104 can include, without limitation, data mining, ad hoc queries, statistical analysis, report generation, decision support, etc. An example of a suitable knowledge management server 104 can include an IBM Information Management for Scoring, Visualization, Modeling and Mining.
System 20 also includes a research application server (RAS) 108. RAS 108 runs a number of research applications, data mining applications and/or other tools required by researchers. These applications, data mining applications and/or tools can include, without limitation: an NCBI Basic Local Alignment Search Tool (“Blast”) program, multiple sequence alignment tools, gene expression applications, and applications for protein structure and function prediction. The Blast program is a search tool to determine the similarity of a given nucleic acid or protein sequence to thousands of other sequences in databases, such as NCBI databases. The multiple sequence alignment tools assist in deducing the function of new proteins, assisting in answering other biological questions such as the evolution and/or phylogenic relationship of the protein. The gene expression applications permit interactive retrieval and analysis of gene expression data with spotted microarray, high density oligonucleotide array, hybridization filtering, serial analysis of gene expression data and other techniques. The applications for protein structure and function prediction include primary sequence alignment, secondary and tertiary structure prediction methods, homology modeling and crystallographic diffraction pattern analysis, etc. As will be apparent to those of skill in the art, many other applications and/or tools can be employed in system 20, and RAS 108 can provide for centralized maintenance and control of these tools.
System 20 also includes a reference data access server 112 and a research data server 116 which operate with virtual storage device 52 and data import controller 56. Reference data access server 112 allows researchers to access reference data, both external data and internal data, for ad hoc queries against a virtual database through federated access to the data sources. By way of example, the virtual database system in reference data access server 112 can be the IBM DiscoveryLink middleware application and the IBM DB2 Universal Database, although other suitable techniques and/or applications can be used. The DiscoveryLink application allows an ad hoc query against multiple data sources in a single request and provides a single response, regardless of geographic locations of data sources, types, formats, schemas and operating platforms, network protocols, etc. External reference data (for example, genome, EST, protein and/or clinical databanks) can be consolidated, using replication, into one logical location to mitigate accessing external reference data through slow, external links such as Internet 68. Using local replicas of reference data provides significant advantages over using the original external sources, although provision must be made to maintain the currency of the external data and costs are incurred in providing the storage space for the local replicas. However, these issues are addressed via data import controller 56 and virtual storage device 52, as described above.
Research data server 116 allows various research applications to access consolidated research and/or experimental data. Examples of such research and/or experimental data include microarray data and serial analysis of gene expression data. Experimental data typically results from experiments performed by the same organization as employs the researcher which uses application server system 96. When a computer or other device generates the experimental data, the computer or other device automatically populates a database with the experimental data based on a configuration file within the computer. Examples of suitable research data servers 116 include the IBM Enterprise Storage System and the IBM Hierarchical Storage Management solutions.
In addition to the nodes, servers and other devices described above, system 20 also includes the following shared system services. Directory management provides naming services to registered entities (e.g.—users, applications, other resources, etc.). Security management provides services to protect assets and resources such as user/entity identification and validation/authentication, access control, privacy protection and security audit functions. System management in conjunction with client software running on managed devices/nodes, provides management services such as problem alerts/reports, performance monitoring, software distribution, data backup and recovery, etc. Storage management in conjunction with virtual storage device 52, provides integrated, consolidated and reliable data storage for reference data access, research data and experimental data.
Examples of the operation and use of system 20 in aspects of a research program will now be described.
As illustrated in
The raw microarray data 40 is collected into a local repository 270, then sent to preprocessor server 92 to filter out missing data, to check for errors such as smearing of the spots, and usually to perform a cluster analysis to group rows and columns that display similar colors and intensities. The result would typically be the standard Clustered Image Map (CIM) representation, which is stored in a reference repository. In parallel, clinical data 36 may be obtained in a standard representation, such as HL7 and CDISC and stored in a local repository 272. Also in parallel, one or more databases of biochemical interaction data 24, such as the STKE database, are accessed and stored in a local repository 274.
The data 24, 36 and 40 is automatically made available to the application server system 96, research application server 108 and post processor 100, which in the example of
A microarray research technician 250 scans the output from the LIMS and the performance of the microarray instrumentation, for example by number of entries in the local repository or reference data repository. The research technician 250 also uses microarray pattern statistics output from the application server system 96. A molecular biologist researcher 252 obtains pattern and probability numerical values from the application server system 96, and then uses them to validate or extend the pathway data, using the identities of interacting biomolecules as obtained from the LIMS and external sources. A physician researcher 254 analyzes statistical correlation of microarray pattern probabilities with known disease states as a means of diagnosis and prognosis. Consequently, the physician uses primary trends and statistics of microarray patterns and clinical data output from the research application server 108 via the application server system 96 and access to the reference data repository of patterns generated by data mining application 290. A drug design researcher 256 uses clinical symptoms microarray statistics and their correlation with pathways output from research application server 108 via the application server system 96, and the following types of data output from the application server system 96: (a) names of biomolecules from clinical microarray data connected to disease state, (b) their probabilities of appearance in the microarrays (where these biomolecules have been identified using the reference data repository), and (c) probable molecular interactions to design a drug that will specifically interact with the molecules that support the disease. With this data, the pharmaceutical designer may then use standard drug design tools to develop a novel drug once these relevant biomolecule targets have been identified.
One of the important activities in research programs is to compare experimental data to reference data, to interpret the experimental data to obtain results, and to store those results with the experimental data.
In
Another important activity in research is the analysis of data and generation of results. An example of this activity, employing system 20, is shown in
Is some cases, the results of the research may be published to internal researchers 36 on the research team, and external researchers 24 and/or external databases. As used herein, the term “publishing” is intended to include the act of making information available for subsequent review, and this would include placing research results into a database which can be externally accessed, publishing scientific articles in journals, etc. and “pushing” results to researchers or institutions, etc. which have previously subscribed or otherwise indicated an interest in the results. Internal publishing can be synchronous, with internal researchers 36 accessing results and other information as it becomes available while external publication can be either synchronous, such as when an external researcher 24 accesses results through knowledge management server 104, or asynchronous such as when an external researcher 24 accesses an external replica of the published data held in an external database.
In
As will now be apparent, the present invention provides an end to end information technology system to allow researchers from multiple disciplines and in geographically diverse locations to cooperate in research efforts. Management of large amounts of experimental and reference information is provided to meet diverse researcher and regulatory requirements, while appropriate security of the managed information is maintained.
The above-described embodiments of the invention are intended to be examples of the present invention and alterations and modifications may be effected thereto, by those of skill in the art, without departing from the scope of the invention which is defined solely by the claims appended hereto.
Claims
1. A system for managing data for researchers, said system comprising:
- means for automatically receiving research data from laboratory instruments;
- means for accessing a database containing established reference data;
- means for automatically obtaining recently available reference data;
- means for accessing a database containing experimental data;
- a plurality of applications to process respective types of said data;
- means, responsive to a request by a researcher to perform a data processing function, for invoking one or more of said processing applications and supplying to said one or more processing applications parameters to perform said data processing function; and
- means, responsive to the invoking and supply of parameters, for said one or more processing applications to automatically access types of said data required to perform the respective data processing function.
2. A system as set forth in claim 1 wherein said invoking and supplying means determines which of said processing applications to invoke and which parameters to supply to said processing applications to be invoked, based on a type of function requested by said researcher.
3. A system as set forth in claim 1 wherein said invoking and supplying means determines identities of files containing said data required by said one or more applications, based on a type of function requested by said researcher.
4. A system as set forth in claim 1 further comprising an application to analyze patterns in respective types of said data, and wherein one of said one or more processing applications receives from the pattern analyzing application a pattern used to perform the requested data processing function.
5. A system as set forth in claim 1 further comprising means for determining if available data is valid, and
- if not, not using said available data for said one or more processing applications;
- if so, using said available data for said one or more processing applications.
6. A system as set forth in claim 1 further comprising means for formatting results of said one or more data processing applications to correspond to respective types of data processing requests.
7. A method for managing data for researchers, said method comprising the steps of:
- automatically receiving research data from laboratory instruments;
- maintaining a database containing established reference data;
- automatically obtaining recently available reference data;
- maintaining a database containing experimental data;
- responsive to a request by a researcher to perform a data processing function, invoking one or more processing applications and supplying to said one or more processing applications parameters to perform said data processing function; and
- in response to the invoking and supply of parameters, said one or more processing applications automatically access types of said data required to perform the respective data processing function.
8. A method as set forth in claim 7 wherein said invoking and supplying step determines which of said processing applications to invoke and which parameters to supply to said processing applications to be invoked, based on a type of function requested by said researcher.
9. A method as set forth in claim 7 wherein said invoking and supplying step determines identities of files containing said data required by said one or more applications, based on a type of function requested by said researcher.
10. A method as set forth in claim 7 further comprising the steps of analyzing patterns in respective types of said data, and providing results of said analyzing pattern step to one of said one or more processing applications to perform the requested data processing function.
11. A method as set forth in claim 7 further comprising the step of determining if available data is valid, and
- if not, not using said available data for said one or more processing applications;
- if so, using said available data for said one or more processing applications.
12. A method as set forth in claim 7 further comprising the step of formatting results of said one or more data processing applications to correspond to respective types of data processing requests.
13. A system for managing data for researchers, said system comprising:
- a virtual storage device including online and near line storage, and having predefined policies for moving stored data between the online and near line storage;
- a research data server for receiving and managing experimental data, research data and research results, and operating with the virtual storage device to maintain the experimental data, research data and research results;
- a reference data access server receiving and managing external reference data relating to research of the researchers and operating with the virtual storage device to maintain the external reference data;
- computational resources for the researchers to capture and process experimental data to generate the research results; and
- a research data network connecting the virtual storage device, research data server, reference data access server and computational resources to allow transfer of data there between, the research data network further including security management services to authenticate and authorize access by the researchers.
14. The system of claim 13 further comprising a data import controller connected to the research data network and operable to retrieve external reference data from data sources external to the research data network according to one or more policies predefined by the researchers for retrieving external reference data.
15. The system of claim 14 wherein the data import controller processes retrieved reference data to determine if it is lower quality or redundant in view of reference data already stored in the virtual storage device.
16. The system of claim 15 wherein the data import controller filters out redundant or lower quality retrieved reference data from entry in the virtual storage device.
17. The system of claim 13 wherein the computational resources include a cluster of computing resources; and the computational results further comprise a post processor, the post processor converts experimental data into useful forms which are relevant for the purpose and context of the research.
18. The system of claim 13 further comprising a laboratory information management system connected to the research data network and to one or more laboratory instruments, the laboratory information management system receiving experimental data from the laboratory instruments and providing that data to the research data server via the research data network.
19. The system of claim 18 further comprising a preprocessing server connected to the research data network, the laboratory information management server providing experimental data from the laboratory instruments to the preprocessing server which converts the experimental data into data which is useful and relevant for the research, the preprocessing server providing the converted data to the research data server via the research data network.
20. The system of claim 13 further including a knowledge management server connected to the research data network and operable to identify and provide a researcher with reference data and/or experimental data and results from the research data server and the reference data access server in accordance with queries made by the researcher.
21. The system of claim 20 wherein a researcher can create policies defining data types of interest to the researcher, and the knowledge management server, in accordance with the defined policy, identifies and provides reference data and experimental data and results of interest to the researcher.
22. The system of claim 13 further including a research application server connected to the research data network, the research application server providing at least one software application and/or tool required by researchers, the at least one application and/or tool operating on data stored in said virtual storage device in accordance with instructions from the researchers.
23. A method of managing research conducted by researchers, said method comprising the steps of:
- creating a set of policies defining external reference information relevant to the research;
- retrieving, at predefined intervals, external reference information in accordance with the policies;
- comparing the retrieved information with reference data stored in a reference data server to determine if the retrieved information is redundant or of lower quality than data already stored in the reference data server and storing retrieved information which was determined to be non-redundant and/or of acceptable better quality in the reference data server;
- storing experimental data from at least one laboratory instrument in a research data server; and
- providing the researchers with access to the stored information in the reference data server and to experimental data in the research data server.
24. The method of claim 23, further comprising the step of the researchers defining a set of data storage policies for a virtual storage device including both online and near line storage capacity to store the data of the reference data server and the research data server, and moving the stored data between online storage capacity and near line storage capacity in accordance with the data storage policies.
25. The method of claim 23 further comprising the step of preprocessing the experimental data from the at least one laboratory instrument and storing the preprocessed experimental data in the research data server.
26. The method of claim 23 further comprising the step of publishing information to researchers by having the researchers identify to a knowledge management server information of interest to them and the knowledge management server examining the contents of the reference data server and the research data server to identify the information of interest to a researcher and the knowledge management server making the identified information available to the researcher.
27. The method of claim 23 further comprising the step of verifying the authenticity and authority of each researcher to access stored experimental data and stored reference data before providing that access.
28. A method of managing data for research, said method comprising the steps of:
- providing a plurality of researchers with access to a research data network;
- creating reference data policies defining for each of said researchers types of reference data that will be of use to the researcher, creating experimental data policies defining for each of said researchers types of experimental data and results that will be of use to the researcher and storing these policies on the research data network;
- retrieving, at defined intervals, from data sources outside the research data network, external reference data as defined by the reference policies;
- examining the retrieved reference data to determine if it is redundant in view of reference data already stored on the research data network or if it is of better quality than reference data already stored on the network and storing retrieved reference data on the research data network which has been determined to be non-redundant or of better quality than reference data already stored;
- collecting experimental data from laboratory instruments through the research data network and storing the collected data on the research data network; and
- publishing new reference data and experimental data to researchers according to the reference data policies and experimental data policies defined for the researchers.
29. The method of claim 28 wherein the reference data and the experimental data are stored in a virtual storage device on the research data network, the virtual storage device having both online and near line data storage capabilities and the research team having predefined a storage policy executed by the virtual storage device to transfer stored data between the online storage and the near line storage.
30. The method of claim 28 further comprising the step of at least one researcher processing published experimental data to obtain experimental results which are stored on the research data network, the stored experimental results also being subsequently published in the publishing step to researchers in accordance with experimental data policies of researchers.
31. A computer program product to manage data for research conducted by researchers, said computer program product comprising:
- a computer readable medium;
- first program instructions executable to provide the researchers with access to a research data network;
- second program instructions to retrieve, at defined intervals, from data sources outside the research data network, external reference data as defined by reference policies created by said researchers;
- third program instructions to examine the retrieved reference data to determine if it is redundant in view of reference data already stored on the research data network or if it is of better quality than reference data already stored on the network and to store retrieved reference data on the research data network which has been determined to be non-redundant or of better quality than reference data already stored;
- fourth program instructions to store experimental data from at least one laboratory instrument in a research data server; and
- fifth program instructions to provide the researchers with access to the stored information in the reference data server and to experimental data in the research data server; and wherein
- said first, second, third, fourth and fifth program instructions are recorded on said medium.
32. A computer program product according to claim 31, further including sixth program instructions to implement a set of data storage policies for a virtual storage device including both online and near line storage capacity to move the stored data between online storage capacity and near line storage capacity in accordance with the data storage policies; and wherein said sixth program instructions are recorded on said medium.
Type: Application
Filed: Oct 25, 2004
Publication Date: Jul 7, 2005
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (ARMONK, NY)
Inventors: Ock Baek (Unionville), Carl Ewig (San Diego, CA)
Application Number: 10/973,959