Identification of null sets in a context-based electronic document search

- IBM

A computer hardware-implemented method, system, and/or computer program product identifies a null set of synthetic event containing electronic files in a database of electronic files. A synthetic event, which is a non-executable descriptor of a set of context-related factors, is created. A context-based search of a database of electronic files is performed to identify a synthetic event containing electronic file that includes the synthetic event. In response to determining that there are no electronic files in the database of electronic files that contain the synthetic event, a set of binary data is transmitted/broadcast. The set of binary data includes a notice that there are no synthetic event electronic files in the database of electronic files.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
BACKGROUND

The present disclosure relates to the field of computers, and specifically to the use of computers when searching for documents. Still more particularly, the present disclosure relates to the use of computers in searching for documents through the use of context-based searches.

Documents, such as technical articles, research papers, academic studies, web pages, blogs, etc. provide information on a wide range of topics. This diversity of information makes the documents valuable to many different types of projects. However, current document search techniques only identify documents that address a specific question/topic, such that a specific question can be answered and/or known information can be confirmed.

SUMMARY

A computer hardware-implemented method, system, and/or computer program product identifies a null set of synthetic event containing electronic files in a database of electronic files. A synthetic event, which is a non-executable descriptor of a set of context-related factors, is generated. A context-based search of a database of electronic files is performed to identify a synthetic event containing electronic file that includes the synthetic event. In response to determining that there are no electronic files in the database of electronic files that contain the synthetic event, a set of binary data is broadcasted/transmitted. The set of binary data includes a notice that there are no synthetic event electronic files in the database of electronic files.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 depicts an exemplary system and network in which the present disclosure may be implemented; and

FIG. 2 is a high level flow chart of one or more exemplary steps taken by a processor to identify a null set of synthetic events in a database of electronic files.

DETAILED DESCRIPTION

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

With reference now to the figures, and in particular to FIG. 1, there is depicted a block diagram of an exemplary system and network that may be utilized by and in the implementation of the present invention. Note that some or all of the exemplary architecture, including both depicted hardware and software, shown for and within computer 102 may be utilized by software deploying server 150, electronic file serving computer(s) 152, and/or report receiving computer(s) 154.

Exemplary computer 102 includes a processor 104 that is coupled to a system bus 106. Processor 104 may utilize one or more processors, each of which has one or more processor cores. A video adapter 108, which drives/supports a display 110, is also coupled to system bus 106. System bus 106 is coupled via a bus bridge 112 to an input/output (I/O) bus 114. An I/O interface 116 is coupled to I/O bus 114. I/O interface 116 affords communication with various I/O devices, including a keyboard 118, a mouse 120, a media tray 122 (which may include storage devices such as CD-ROM drives, multi-media interfaces, etc.), a printer 124, and external USB port(s) 126. While the format of the ports connected to I/O interface 116 may be any known to those skilled in the art of computer architecture, in one embodiment some or all of these ports are universal serial bus (USB) ports.

As depicted, computer 102 is able to communicate with a software deploying server 150, as well as electronic file serving computer(s) 152 and report receiving computer(s) 154, using a network interface 130. Network interface 130 is a hardware network interface, such as a network interface card (NIC), etc. Network 128 may be an external network such as the Internet, or an internal network such as an Ethernet or a virtual private network (VPN).

A hard drive interface 132 is also coupled to system bus 106. Hard drive interface 132 interfaces with a hard drive 134. In one embodiment, hard drive 134 populates a system memory 136, which is also coupled to system bus 106. System memory is defined as a lowest level of volatile memory in computer 102. This volatile memory includes additional higher levels of volatile memory (not shown), including, but not limited to, cache memory, registers and buffers. Data that populates system memory 136 includes computer 102's operating system (OS) 138 and application programs 144.

OS 138 includes a shell 140, for providing transparent user access to resources such as application programs 144. Generally, shell 140 is a program that provides an interpreter and an interface between the user and the operating system. More specifically, shell 140 executes commands that are entered into a command line user interface or from a file. Thus, shell 140, also called a command processor, is generally the highest level of the operating system software hierarchy and serves as a command interpreter. The shell provides a system prompt, interprets commands entered by keyboard, mouse, or other user input media, and sends the interpreted command(s) to the appropriate lower levels of the operating system (e.g., a kernel 142) for processing. Note that while shell 140 is a text-based, line-oriented user interface, the present invention will equally well support other user interface modes, such as graphical, voice, gestural, etc.

As depicted, OS 138 also includes kernel 142, which includes lower levels of functionality for OS 138, including providing essential services required by other parts of OS 138 and application programs 144, including memory management, process and task management, disk management, and mouse and keyboard management.

Application programs 144 include a renderer, shown in exemplary manner as a browser 146. Browser 146 includes program modules and instructions enabling a world wide web (WWW) client (i.e., computer 102) to send and receive network messages to the Internet using hypertext transfer protocol (HTTP) messaging, thus enabling communication with software deploying server 150 and other computer systems.

Application programs 144 in computer 102's system memory (as well as software deploying server 150's system memory) also include a null set search program (NSSP) 148. NSSP 148 includes code for implementing the processes described below, including those described in FIG. 2. In one embodiment, computer 102 is able to download NSSP 148 from software deploying server 150, including in an on-demand basis, wherein the code in NSSP 148 is not downloaded until needed for execution. Note further that, in one embodiment of the present invention, software deploying server 150 performs all of the functions associated with the present invention (including execution of NSSP 148), thus freeing computer 102 from having to use its own internal computing resources to execute NSSP 148.

Note that the hardware elements depicted in computer 102 are not intended to be exhaustive, but rather are representative to highlight essential components required by the present invention. For instance, computer 102 may include alternate memory storage devices such as magnetic cassettes, digital versatile disks (DVDs), Bernoulli cartridges, and the like. These and other variations are intended to be within the spirit and scope of the present invention.

With reference now to FIG. 2, a high level flow chart of one or more exemplary steps taken by a processor to identify a null set of synthetic events in a database of electronic files is presented. After initiator block 202, a description of a synthetic event (e.g., in the form of binary data that can be processed by computer hardware) is defined (block 204). The synthetic event is defined as a non-executable descriptor of set of context-related factors. For example, a synthetic event may be the occurrence of a set of words A, B, and C (i.e., the occurrence of all three words is the “factor”) in a single document (i.e., where being within the same document is the “context”).

Another example of a synthetic event may be a combination of facts about a patient, such as that patient's age, a medical diagnosis of a primary disease currently afflicting that patient, and a list of medications being taken by that patient (“factors”) while the patient is being diagnosed for a secondary (caused by the primary disease) disease (“context”).

Another example of a synthetic event may be a set of features being examined in a scientific laboratory while studying a particular disease. That is, in this example the “context” would be a research project that is directed towards understanding the etiology (underlying cause) of a particular disease, and the “factors” are the phenotype (physical appearance), genotype (genetic makeup), and environment (e.g., exposure to certain chemicals, etc.) common to persons having this particular disease.

With reference to block 206 in FIG. 2, a context-based search of electronic files in a database is then performed to locate the synthetic event, which was created in block 204, in each electronic file from a database of electronic files. These electronic files are provided by electronic file serving computer(s), such as the electronic file serving computer(s) 152 depicted in FIG. 1.

The term “context-based search” is defined as a search of electronic files that are contextually related to the original synthetic event. For example, assume that the synthetic event is generated while conducting medical research in a particular field (e.g., oncology). In this example, the activity type (research) defines the scope of the context and thus the context-based search, such that only files directly related to oncology research are searched.

Alternatively, the “context-based search” may be limited to only files that are not related to the activities that generated the synthetic event. For example, continue to assume that the activity that generated the original synthetic event was oncology research. By searching non-medical literature (e.g., economic studies) that are not directed towards oncology research, and yet still include a reference to the original synthetic event (e.g., descriptions of oncology research findings), an unexpected connection may be made between the original synthetic event and the non-synthetic event element(s) found in the non-medical literature.

As used herein, an electronic file is defined as any file or collection of data. Examples of such files/data collections include, but are not limited to, text based documents, image files, and audio files. Examples of text based documents include, but are not limited to, text files, blogs, tweets, e-mail messages, web pages, instant messages, etc. Examples of image files include, but are not limited to, MPEG (Moving Picture Experts Group) files for movies, JPEG (Joint Photographic Experts Group) files for still photos, TIFF (Tagged Image File Format) and PDF (Portable Document Format) files for scanned documents, DICOM (Digital Imaging and Communications in Medicine) files for medical images, FITS (Flexible Image Transport System) files for astronomy images, etc. Examples of audio files include, but are not limited to, audio recordings (e.g., WAV files, MP3 files, VOX files, etc.) generated from a microphone or other sound capturing device.

When searching for a text based document that contains certain words/phrases, a simple word search is performed on each document (electronic file) in a file database (e.g., research papers, magazine articles, etc. on the Internet on in a local database). Before performing this word search of the text based document, however, a determination is first made as to whether this text based document is contextually related to the original synthetic event, such that the search of the electronic files can be context-based. For example, assume that a synthetic event is that a “city” has an average high temperature of “90 degrees.” Before determining if an electronic document contains the synthetic event element “90 degrees”, a determination is first made as to whether the electronic document is actually related to meteorology. This determination can be made by a search of “keywords” listed for many articles. These keywords provide words (such as “meteorology”) that describe the context of the text based document. However, if no such keyword listing exists, then the document must be examined for context.

For example, if a search identifies words such as “diploma” and “curriculum” in a particular electronic file, then the “90 degrees” is probably describing a college, and is skipped. Similarly, if the phrases “right angle” or “food” or “patient” or “channel iron” are in a particular electronic file, these files are also skipped, since the content of such an electronic file is not contextually related. That is, the context-based search is not interested in, and therefore ignores, articles about a college that offers diplomas in 90 different disciplines (“90 degrees” that are offered by academia), math publications about right angles (“90 degrees” of arc), articles about food preparation (recommending that a sauce be kept at a temperature above “90 degrees”), articles about patients having hypothermia (describing a patient's core body temperature dropping down to “90 degrees”), or brochures about structural iron (advertising “90 degree” channel iron).

However, if terms such as “weather” or “drought” occur in an electronic file, then that electronic file is likely related to the synthetic event (“city having an average high temperature of 90 degrees”), and is thus identified as a contextually-related and therefore synthetic event containing electronic file.

Note that in another embodiment, the determination of the context of the searched text based document is made after the synthetic event element is identified in a particular document/electronic file. For example, assume that a text based document is short (less than a predetermined number of words) and does not have a “keyword” listing. In this case, a search is made for the terms “city” and “90 degrees” in the document. If the terms are found in the document, then a context evaluation (using context determination methodology such as that described herein) determines whether that document is relevant before searching for the original synthetic event in that document.

While searching a text based document can be performed as described above, searching for a synthetic event in a video file requires additional processing. First, a query is made to determine whether metadata describing the synthetic event, as well as metatags describing the images being searched, are available. If so, then the metadata/metatag is simply searched for, as with a text search. However, if such metadata is not available (or at least not with the degree of specificity needed to identify the synthetic event), then image matching must be performed. That is, a particular image (e.g., a bright spot) that makes up part of the video file's synthetic event (a retina having a bright spot indicating a hole in the retina) is digitized into a binary value. This binary value, along with other digitized images (i.e., digital files describing the retina) from the synthetic event, are then searched for in other digitized electronic video files. A similar process occurs with audio files, in which a particular sound (e.g., screeching tires as a “context-related factor” within the “context” of an automobile accident) is digitized into a binary value, which is used in the search of digitized electronic audio files.

In one embodiment, the electronic files that are selected for searching are based on a ranking of their source, in which the ranking is based on a public reputation of the source. For example, assume that a particular electronic file is from a known, trusted, and highly respected source (e.g., a prestigious research journal). This description of the source (“known, trusted, highly respected”) leads to a weighting of this particular source. For example, this source may be given a weighting of “9” (on a scale of 1-10). Another electronic file, however, comes from a source that is given a weighting of only “2”, since it comes from a blog entry that has not been peer-reviewed, and the author is anonymous. In this example, the higher ranked (“9”) source is weighted higher than the lower ranked (“2”) source. In one embodiment, the higher ranked sources are searched before the lower ranked source down to some predetermined baseline. That is, a predetermination may be made that only sources ranked between 9-10 will be initially searched for synthetic event containing electronic files. If time, computer resources, and/or money are still available, then sources ranked between 7-8 will be searched. The process continues until 1) there is no more time, computer resources, money, etc. available; 2) all available electronic files have been examined (e.g., within a local database); or 3) only available electronic files ranked higher than some predetermined number (e.g., those electronic files whose sources are ranked higher than “7”) have been predetermined to be authorized for examination, and all such ranked electronic files have been examined.

In one embodiment, the ranking of the source of the electronic files is based on an historical frequency of usage of the source by a generator of the synthetic event. For example, assume that a particular user and/or computer system routinely examines a particular database for electronic files. In a first embodiment, a source that is frequently used is deemed to be more trustworthy, and thus is weighted higher. However, in a second embodiment, a less frequently used (and thus more obscure) source is deemed to be more likely to provide a non-synthetic event element (described below) that has not been previously considered, and thus is weighted higher. In either embodiment, the higher ranked sources are searched before the lower ranked source down to some predetermined baseline, as described above.

With referenced now to query block 208, if none of the electronic files in the database of electronic files contain the synthetic event received in block 204, (i.e., a “null set” of synthetic events are found when searching each of the entire database of electronic files), then a user is given notice (block 210) that no single electronic file from the database of electronic files contains all of the elements of the synthetic event received in block 204. That is, a report can be broadcast to multiple computers, including the report receiving computer(s) 154 shown in FIG. 2, that none of the electronic files contains a particular synthetic event that the searcher created. Furthermore, in one embodiment, a recommendation is generated (in the form of a second set of binary data that can be manipulated by computer hardware) is transmitted to the requesting computer (also block 210). This recommendation is based on an absence of synthetic event containing electronic files in the database of electronic files (i.e., no single electronic file in the database of electronic files contains all of the elements of the synthetic event). Such a recommendation may be to 1) conduct an activity that is unrelated to the original synthetic event, or 2) conduct additional steps related to the original synthetic event.

For example, assume that a user and/or a computer logic generated a synthetic event describing factors related to a scientific laboratory research being performed by a user. This user activity may be a study of blood samples from a particular set of patients that have hepatitis, are over 65 years of age, and are omnivores (collectively referred to as “synthetic event A”). A search of electronic files, using the process described herein, reveals one or more electronic files that contain synthetic event A, and which also contain non-synthetic event elements, such as a reference to a particular athletic team.

There may be nothing in the identified synthetic event containing electronic file, or in any other electronic file, that correlates synthetic event A with being a fan of this particular athletic team, since the two seemingly are unrelated. However, context-based computer logic can make a suggestion that the two are related, and will thus generate a recommendation to the user to study their connection. Thus, a recommendation can be computer generated to 1) study health hazards associated with attending games played by this particular athletic team. That is, a recommendation can then be made to research topics related to the specific athletic team, which may lead to a vendor identified by the context-based computer logic, or it may lead to previously unreported factors (e.g., housekeeping/sanitation processes in place at the stadium venue of this particular athletic team, etc.).

In one embodiment, this recommendation may be prompted by the context-based computer logic recognizing that many (more than some predetermined number/percentage) electronic files reference both synthetic event A and this particular athletic team.

In another embodiment, this recommendation may be prompted by the context-based computer logic associating this particular athletic team to a particular venue (in one electronic file), associating this particular venue to a particular vendor (in another electronic file), and this particular vendor to a health code violation citation (in yet another electronic file), thus leading the context-based computer logic to recognize a possible connection between synthetic event A and the particular athletic team.

Alternatively, a recommendation can be made to 2) perform additional laboratory tests on the blood samples from the particular set of patients. For example, assume that the current scientific laboratory research is only directed to making a microscopic examination of the blood samples. Based on the identified non-synthetic event element(s), a recommendation may be made to perform a genetic study of the blood samples, in order to determine if there are any genetic mutations associated with both synthetic event A and this particular athletic team. This proposition may initially appear unfounded. However, an examination of the genome's integrity may offer clues/information that actually supports the proposition, or at least offers guidance in a new research direction.

In another embodiment of the present invention, assume that the synthetic event describes factors related to diagnosing a medical patient. That is, assume that the synthetic event is that a particular patient has hypertension, is over 65 years of age, and is an omnivore (collectively referred to as “synthetic event B”). However, the health care provider is unable to diagnose a secondary disease (which is caused by the primary disease of hypertension) based on these factors and the patient's complaint of chronic fatigue. A search of the electronic files databases (e.g., from the Internet) reveals one or more documents (synthetic event containing electronic files) that include the synthetic event B as well as the non-synthetic event element, in which a local power generation plant is referenced. Based on the type of analysis described above, a recommendation may generated to 1) conduct an activity that is unrelated to the original synthetic event, or 2) conduct additional steps related to the original synthetic event. That is, 1) a recommendation can be made to study environmental issues around the identified local power generation plant. Alternatively, 2) a recommendation can be made to the health care provider to perform a genetic study of the particular patient. Again, while such a study would initially appear to be unnecessary, the synthetic event containing electronic file provides the necessary information to prompt such additional testing.

The process depicted in FIG. 2 ends at terminator block 212.

Thus, the present process provides a novel method for identifying “holes” in literature, research, medical diagnoses, etc. that would not be apparent. That is, the report and/or recommendation discussed in block 210 provide multiple users with the motivation to “fill” such “holes”, even if they were not aware of the “holes” before.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of various embodiments of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Note further that any methods described in the present disclosure may be implemented through the use of a VHDL (VHSIC Hardware Description Language) program and a VHDL chip. VHDL is an exemplary design-entry language for Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), and other similar electronic devices. Thus, any software-implemented method described herein may be emulated by a hardware-based VHDL program, which is then applied to a VHDL chip, such as a FPGA.

Having thus described embodiments of the invention of the present application in detail and by reference to illustrative embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims.

Claims

1. A computer hardware-implemented method of identifying a null set of synthetic event containing files in a database of electronic files, the computer hardware-implemented method comprising:

defining a synthetic event, wherein the synthetic event is a non-executable descriptor of a set of context-related factors, wherein the synthetic event is an occurrence of a set of words in a single document, wherein a factor in the set of context-related factors is the occurrence of all words in the set of words, and wherein a context of the set of context-related factors is the single document containing all of the words in the set of words;
performing a context-based search of a database of electronic files to identify a synthetic event containing electronic file, wherein the synthetic event containing electronic file comprises the synthetic event; and
in response to determining that there are no electronic files in the database of electronic files that contain the synthetic event, broadcasting a set of binary data that identifies the null set of synthetic event containing files in the database of electronic files.

2. The computer hardware-implemented method of claim 1, wherein the synthetic event further describes factors related to a user activity, and wherein the computer hardware-implemented method further comprises:

generating the recommendation to perform additional steps related to the user activity.

3. The computer hardware-implemented method of claim 1, wherein the synthetic event further describes factors related to a user activity, wherein the user activity is diagnosing a medical patient, and wherein the computer hardware-implemented method further comprises:

generating the recommendation to perform additional medical tests, on the medical patient, which are related to the identified synthetic event containing electronic file.

4. The computer hardware-implemented method of claim 1, further comprising:

limiting the context-based search to search only files that are not related to activities that generated the synthetic event, wherein an activity that generated the synthetic event was medical research, and wherein the context-based search is limited to searching non-medical literature; and
establishing a connection between the synthetic event and non-synthetic event elements found in the non-medical literature.

5. The computer hardware-implemented method of claim 1, further comprising:

ranking a source of the identified synthetic event containing electronic file, wherein the ranking is based on a public reputation of the source; and
weighting the identified synthetic event containing electronic file based on said ranking.

6. The computer hardware-implemented method of claim 1, further comprising:

ranking a source of the synthetic event containing electronic file, wherein the ranking is based on an historical frequency of usage of the source by a generator of the synthetic event; and
weighting the identified synthetic event containing electronic file based on said ranking.

7. A computer program product for identifying a null set of synthetic event containing files in a database of electronic files, the computer program product comprising:

a non-transitory computer readable storage media;
first program instructions to define a synthetic event, wherein the synthetic event is a non-executable descriptor of a set of context-related factors, wherein the synthetic event is an occurrence of a set of words in a single document, wherein a factor in the set of context-related factors is the occurrence of all words in the set of words, and wherein a context of the set of context-related factors is the single document containing all of the words in the set of words;
second program instructions to perform a context-based search of a database of electronic files to identify a synthetic event containing electronic file, wherein the synthetic event containing electronic file comprises the synthetic event; and
third program instructions to, in response to determining that there are no electronic files in the database of electronic files that contain the synthetic event, transmit a set of binary data to the requesting computer, wherein the set of binary data identifies the null set of synthetic event containing files in the database of electronic files; and wherein the first, second, and third program instructions are stored on the non-transitory computer readable storage media.

8. The computer program product of claim 7, wherein the synthetic event further describes factors related to a user activity, and wherein the computer program product further comprises:

fourth program instructions to the recommendation to perform additional steps related to the user activity; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media.

9. The computer program product of claim 7, wherein the synthetic event further describes factors related to a user activity, wherein the user activity is scientific laboratory research, and wherein the computer program product further comprises:

fourth program instructions to generate the recommendation to perform additional scientific laboratory research on topics related to the identified synthetic event containing electronic file; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media.

10. The computer program product of claim 7, wherein the synthetic event further describes factors related to a user activity, wherein the user activity is diagnosing a medical patient, and wherein the computer program product further comprises:

fourth program instructions to generate the recommendation to perform additional medical tests, on the medical patient, which are related to the identified synthetic event containing electronic file, wherein the synthetic event is a combination of facts about a patient, wherein the facts about the patient include the patient's age, a medical diagnosis of a primary disease currently afflicting the patient, and a list of medications being taken by the patient, wherein the patient's age, the medical diagnosis of the primary disease currently afflicting the patient, and the list of medications being taken by the patient are factors in the context-related factors, and wherein a context of the context-related factors is the patient being diagnosed for a secondary disease that is caused by the primary disease; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media.

11. The computer program product of claim 7, wherein the synthetic event further describes a user activity, and wherein the computer program product further comprises:

fourth program instructions to determine a context for the context-based search based on an activity type of the user activity; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media.

12. The computer program product of claim 7, further comprising:

fourth program instructions to rank a source of the synthetic event containing electronic file, wherein the ranking is based on a public reputation of the source; and
fifth program instructions to weight the identified synthetic event containing electronic file based on said ranking; and wherein the fourth and fifth program instructions are stored on the non-transitory computer readable storage media.

13. The computer program product of claim 7, further comprising:

fourth program instructions to rank a source of the identified synthetic event containing electronic file, wherein the ranking is based on an historical frequency of usage of the source by a generator of the synthetic event; and
fifth program instructions to weight the identified synthetic event containing electronic file based on said ranking; and wherein the fourth and fifth program instructions are stored on the non-transitory computer readable storage media.

14. A computer system comprising:

a central processing unit (CPU), a computer readable memory, and a non-transitory computer readable storage media;
first program instructions to define a synthetic event, wherein the synthetic event is a non-executable descriptor of a set of context-related factors, wherein the synthetic event is an occurrence of a set of words in a single document, wherein a factor in the set of context-related factors is the occurrence of all words in the set of words, and wherein a context of the set of context-related factors is the single document containing all of the words in the set of words;
second program instructions to perform a context-based search of a database of electronic files to identify a synthetic event containing electronic file, wherein the synthetic event containing electronic file comprises the synthetic event; and
third program instructions to, in response to determining that there are no electronic files in the database of electronic files that contain the synthetic event, transmit a set of binary data to the requesting computer, wherein the set of binary data identifies the null set of synthetic event containing files in the database of electronic files; and wherein the first, second, and third program instructions are stored on the non-transitory computer readable storage media for execution by the CPU via the computer readable memory.

15. The computer system of claim 14, wherein the synthetic event further describes factors related to a user activity, and wherein the computer system further comprises:

fourth program instructions to the recommendation to perform additional steps related to the user activity; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media for execution by the CPU via the computer readable memory.

16. The computer system of claim 14, wherein the synthetic event further describes factors related to a user activity, wherein the user activity is scientific laboratory research, and wherein the computer system further comprises:

fourth program instructions to generate a recommendation to research topics related to the identified synthetic event containing electronic file, wherein the synthetic event is a set of features being examined in a scientific laboratory while studying a particular disease, wherein a context of the set of context-related factors is a research project that is directed towards understanding the etiology of a particular disease, and wherein factors of the set of context-related factors are a phenotype, genotype, and exposure to specific chemicals common to persons having the particular disease; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media for execution by the CPU via the computer readable memory.

17. The computer system of claim 14, wherein the synthetic event further describes factors related to a user activity, wherein the user activity is diagnosing a medical patient, and wherein the computer system further comprises:

fourth program instructions to generate the recommendation to perform additional scientific laboratory research on topics related to the identified synthetic event containing electronic file; and wherein the fourth program instructions are stored on the computer readable storage media for execution by the CPU via the non-transitory computer readable memory.

18. The computer system of claim 14, wherein the synthetic event further describes factors related to a user activity, and wherein the computer system further comprises:

fourth program instructions to determine a context for the context-based search based on an activity type of the user activity; and wherein the fourth program instructions are stored on the non-transitory computer readable storage media for execution by the CPU via the computer readable memory.

19. The computer system of claim 14, further comprising:

fourth program instructions to rank a source of the identified synthetic event containing electronic file, wherein the ranking is based on a public reputation of the source; and
fifth program instructions to weight the identified synthetic event containing electronic file based on said ranking; and wherein the fourth and fifth program instructions are stored on the non-transitory computer readable storage media for execution by the CPU via the computer readable memory.
Referenced Cited
U.S. Patent Documents
5450535 September 12, 1995 North
5664179 September 2, 1997 Tucker
5689620 November 18, 1997 Kopec et al.
5701460 December 23, 1997 Kaplan et al.
5974427 October 26, 1999 Reiter
6199064 March 6, 2001 Schindler
6275833 August 14, 2001 Nakamura et al.
6314555 November 6, 2001 Ndumu et al.
6334156 December 25, 2001 Matsuoka et al.
6553371 April 22, 2003 Gutierrez-Rivas et al.
6633868 October 14, 2003 Min et al.
6768986 July 27, 2004 Cras et al.
7058628 June 6, 2006 Page
7337174 February 26, 2008 Craig
7441264 October 21, 2008 Himmel et al.
7523118 April 21, 2009 Friedlander et al.
7523123 April 21, 2009 Yang et al.
7571163 August 4, 2009 Trask
7702605 April 20, 2010 Friedlander et al.
7748036 June 29, 2010 Speirs, III et al.
7752154 July 6, 2010 Friedlander et al.
7778955 August 17, 2010 Kuji
7783586 August 24, 2010 Friedlander et al.
7788202 August 31, 2010 Friedlander et al.
7788203 August 31, 2010 Friedlander et al.
7792774 September 7, 2010 Friedlander et al.
7792776 September 7, 2010 Friedlander et al.
7792783 September 7, 2010 Friedlander et al.
7797319 September 14, 2010 Piedmonte
7805390 September 28, 2010 Friedlander et al.
7805391 September 28, 2010 Friedlander et al.
7809660 October 5, 2010 Friedlander et al.
7853611 December 14, 2010 Friedlander et al.
7870113 January 11, 2011 Gruenwald
7877682 January 25, 2011 Aegerter
7930262 April 19, 2011 Friedlander et al.
7953686 May 31, 2011 Friedlander et al.
7970759 June 28, 2011 Friedlander et al.
7996393 August 9, 2011 Nanno et al.
8046358 October 25, 2011 Thattil
8055603 November 8, 2011 Angell et al.
8069188 November 29, 2011 Larson et al.
8086614 December 27, 2011 Novy
8145582 March 27, 2012 Angell et al.
8150882 April 3, 2012 Meek et al.
8155382 April 10, 2012 Rubenstein
8161048 April 17, 2012 Procopiuc et al.
8199982 June 12, 2012 Fueyo et al.
8234285 July 31, 2012 Cohen
8250581 August 21, 2012 Blanding
8341626 December 25, 2012 Gardner et al.
8447273 May 21, 2013 Friedlander et al.
20020111792 August 15, 2002 Cherny
20020184401 December 5, 2002 Kadel et al.
20030065626 April 3, 2003 Allen
20030088576 May 8, 2003 Hattori et al.
20030149562 August 7, 2003 Walther
20040111410 June 10, 2004 Burgoon et al.
20040153461 August 5, 2004 Brown et al.
20040162838 August 19, 2004 Murayama et al.
20040249789 December 9, 2004 Kapoor et al.
20050050030 March 3, 2005 Gudbjartsson et al.
20050165866 July 28, 2005 Bohannon et al.
20050273730 December 8, 2005 Card et al.
20060004851 January 5, 2006 Gold et al.
20060036568 February 16, 2006 Moore et al.
20060190195 August 24, 2006 Watanabe et al.
20060197762 September 7, 2006 Smith et al.
20060271586 November 30, 2006 Federighi et al.
20060290697 December 28, 2006 Madden et al.
20070006321 January 4, 2007 Bantz et al.
20070016614 January 18, 2007 Novy
20070073734 March 29, 2007 Doan et al.
20070079356 April 5, 2007 Grinstein
20070136048 June 14, 2007 Richardson-Bunbury et al.
20070185850 August 9, 2007 Walters et al.
20070282916 December 6, 2007 Albahari et al.
20070300077 December 27, 2007 Mani et al.
20080065655 March 13, 2008 Chakravarthy et al.
20080066175 March 13, 2008 Dillaway et al.
20080086442 April 10, 2008 Dasdan et al.
20080091503 April 17, 2008 Schirmer et al.
20080133474 June 5, 2008 Hsiao et al.
20080172715 July 17, 2008 Geiger et al.
20080208813 August 28, 2008 Friedlander et al.
20080208838 August 28, 2008 Friedlander et al.
20080208901 August 28, 2008 Friedlander et al.
20080281801 November 13, 2008 Larson et al.
20080306926 December 11, 2008 Friedlander et al.
20090024553 January 22, 2009 Angell et al.
20090064300 March 5, 2009 Bagepalli et al.
20090125546 May 14, 2009 Iborra et al.
20090144609 June 4, 2009 Liang et al.
20090164649 June 25, 2009 Kawato
20090165110 June 25, 2009 Becker et al.
20090287676 November 19, 2009 Dasdan
20090299988 December 3, 2009 Hamilton, II et al.
20090327632 December 31, 2009 Glaizel et al.
20100070640 March 18, 2010 Allen et al.
20100088322 April 8, 2010 Chowdhury et al.
20100131293 May 27, 2010 Linthicum et al.
20100179933 July 15, 2010 Bai et al.
20100191747 July 29, 2010 Ji et al.
20100241644 September 23, 2010 Jackson et al.
20100257198 October 7, 2010 Cohen et al.
20100274785 October 28, 2010 Procopiuc et al.
20110040724 February 17, 2011 Dircz
20110066649 March 17, 2011 Berlyant et al.
20110077048 March 31, 2011 Busch
20110087678 April 14, 2011 Frieden et al.
20110123087 May 26, 2011 Nie et al.
20110137882 June 9, 2011 Weerasinghe
20110194744 August 11, 2011 Wang et al.
20110208688 August 25, 2011 Ivanov et al.
20110246483 October 6, 2011 Darr et al.
20110246498 October 6, 2011 Forster
20110282888 November 17, 2011 Koperski et al.
20110301967 December 8, 2011 Friedlander et al.
20110314155 December 22, 2011 Narayanaswamy et al.
20120004891 January 5, 2012 Rameau et al.
20120016715 January 19, 2012 Brown et al.
20120023141 January 26, 2012 Holster
20120072468 March 22, 2012 Anthony et al.
20120079493 March 29, 2012 Friedlander et al.
20120110004 May 3, 2012 Meijer
20120131139 May 24, 2012 Siripurapu et al.
20120131468 May 24, 2012 Friedlander et al.
20120191704 July 26, 2012 Jones
20120209858 August 16, 2012 Lamba et al.
20120221439 August 30, 2012 Sundaresan et al.
20120233194 September 13, 2012 Ohyu et al.
20120239761 September 20, 2012 Linner et al.
20120240080 September 20, 2012 O'Malley
20120246148 September 27, 2012 Dror
20120259841 October 11, 2012 Hsiao et al.
20120278897 November 1, 2012 Ang et al.
20120281830 November 8, 2012 Stewart et al.
20120297278 November 22, 2012 Gattani et al.
20120311587 December 6, 2012 Li et al.
20120316821 December 13, 2012 Levermore et al.
20120330958 December 27, 2012 Xu et al.
20130019084 January 17, 2013 Orchard et al.
20130031302 January 31, 2013 Byom et al.
20130060696 March 7, 2013 Martin et al.
20130103389 April 25, 2013 Gattani et al.
20130124564 May 16, 2013 Oztekin et al.
20130238667 September 12, 2013 Carvalho et al.
20130291098 October 31, 2013 Chung et al.
20130326412 December 5, 2013 Treiser
20140012884 January 9, 2014 Bornea et al.
20140025702 January 23, 2014 Curtiss et al.
Foreign Patent Documents
1566752 August 2005 EP
1843259 October 2007 EP
2006086179 August 2006 WO
2007044763 April 2007 WO
Other references
  • U.S. Appl. No. 13/342,406—Non-Final Office Action Mailed Sep. 27, 2013.
  • U.S. Appl. No. 13/610,347—Non-Final Office Action Mailed Jul. 19, 2013.
  • U.S. Appl. No. 13/610,347—Notice of Allowance Mailed Aug. 19, 2013.
  • M.J. Flynn, et al, “Sparse Distributed Memory Principles of Operation”, Research Institute for Advanced Computer Science, 1989, pp. 1-60.
  • P. Kanerva, “Hyperdimensional Computing: An Introduction to Computing in Distributed Representation With High-Dimensional Random Vectors”, Springer Science+Business Media, LLC, Cogn Comput, 1, 2009, pp. 139-159.
  • P. Kanerva, “What We Mean When We Say “What's The Dollar of Mexico?”; Prototypes and Mapping in Concept Space”, Quantum Informatics for Cognitive, Social, and Semantic Processes: Papers From the AAAI Fall Symposium, Association for the Advancement of Artificial Intelligence, 2010, pp. 2-6.
  • M. Yu, et al., “Secure and Robust Error Correction for Physical Unclonable Functions”, Verifying Physical Trustworthiness of ICS and Systems, IEEE Design & Test of Computers, IEEE, Jan./Feb. 2010, pp. 48-64.
  • A. Jin, et al., “Biohashing: Two Factor Authentication Featuring Fingerprint Data and Tokenised Random Number,” Pattern Recognition 37, Elsevier Ltd., 2004, pp. 2245-2255.
  • N. Saxena et al., “Data remanence effects on memory-based entropy collection for RFID systems”, International Journal of Information Security 10.4 (2011), pp. 213-222.
  • A. Birrell et al., “A design for high-performance flash disks.” ACM SIGOPS Operating Systems Review 41.2 (2007), pp. 88-93.
  • Richard Saling, “How to Give a Great Presentation! From The HP Learning Center”, Jul. 28, 2008, <http://rsaling.wordpress.com/2008/07/28/how-to-give-a-great-presentation/>, pp. 1-28.
  • U.S. Appl. No. 13/342,305, Friedlander et al.—Specification filed Jan. 3, 2012.
  • K. Matterhorn, “How to Share Data Between a Host Computer & Virtual Machine,” Ehow, pp. 1-3, <http://www.ehow.com/how7385388share-host-computer-virtual-machine.html>, Retrieved Feb. 17, 2013.
  • W. Caid et al., “Context Vector-Based Text Retrieval”, Fair Isaac Corporation, Aug. 2003, pp. 1-20.
  • Anonymous “Fraud Detection Using Data Analytics in the Banking Industry,” ACL Services Ltd., 2010, pp. 1-9 <http://www.acl.com/pdfs/DPFrauddetectionBANKING.pdf>.
  • Visual Paradigm, “DB Visual Architect 4.0 Designer's Guide: Chapter 6—Mapping Object Model to Data and Vice Versa”, 2007, pp. 6-2-6-26.
  • Lorenzo Alberton, “Graphs in the Database: SQL Meets Social Networks,” TechPortal, Sep. 7, 2009, http://techportal.invqa.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/, pp. 1-11.
  • Avinash Kaushik, “End of Dumb Tables in Web Analytics Tools! Hello: Weighted Sort”, Sep. 7, 2010, www.kaushik.net, pp. 1-15.
  • Evaggelio Pitoura et al., “Context in Databases”, University of Ioannina, Greece, 2004, pp. 1-19.
  • J. Cheng et al., “Context-Aware Object Connection Discovery in Large Graphs”, Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on, pp. 856-867.
  • R. Angles et al., “Survey of Graph Database Models”, ACM Computing Surveys, vol. 40, No. 1, Article 1, Feb. 2008, pp. 1-65.
  • U.S. Appl. No. 13/562,714, Robert R. Friedlander, et al.—Specification and Drawings filed Jul. 31, 2012.
  • Faulkner, Paul, “Common Patterns for Synthetic Events in Websphere Business Events,” Jan. 15, 2011, http://www.ibm.com/developerworks/websphere/bpmjournal/1101faulkner2.html, pp. 1-6.
  • U.S. Appl. No. 13/592,905—Non-Final Office Action Mailed May 8, 2013.
  • “Ninth New Collegiate Dictionary”, Merriam-Webster Inc., 1991, p. 77 and 242.
  • “The American Heritage College Dictionary”, Fourth Edition, Houghton Mifflin Company, 2004, p. 44 and 262.
  • U.S. Appl. No. 13/680,832—Non-Final Office Action Mailed Apr. 8, 2014.
  • U.S. Appl. No. 13/628,853—Notice of Allowance Mailed Mar. 4, 2014.
  • U.S. Appl. No. 13/595,356—Non-Final Office Action Mailed Apr. 14, 2014.
  • U.S. Appl. No. 13/540,230—Non-Final Office Action Mailed Jan. 30, 2014.
  • U.S. Appl. No. 13/540,267—Non-Final Office Action Mailed Feb. 4, 2014.
  • U.S. Appl. No. 13/609,710—Non-Final Office Action Mailed Jan. 27, 2014.
  • U.S. Appl. No. 13/342,406—Notice of Allowance Mailed Mar. 20, 2014.
  • U.S. Appl. No. 13/628,853—Non-Final Office Action Mailed Nov. 7, 2013.
  • U.S. Appl. No. 13/593,905—Notice of Allowance Mailed Oct. 25, 2013.
  • U.S. Appl. No. 13/755,623—Notice of Allowance Mailed May 27, 2014.
  • S. Alam et al., “Interoperability of Security-Enabled Internet of Things”, Springer, Wireless Personal Communications, Dec. 2011, vol. 61, pp. 567-586.
  • U.S. Appl. No. 13/648,801—Non-Final Office Action Mailed Jul. 1, 2014.
  • U.S. Appl. No. 13/609,710—Final Office Action Mailed Jul. 24, 2014.
Patent History
Patent number: 8898165
Type: Grant
Filed: Jul 2, 2012
Date of Patent: Nov 25, 2014
Patent Publication Number: 20140006419
Assignee: International Business Machines Corporation (Armonk, NY)
Inventors: Robert R. Friedlander (Southbury, CT), James R. Kraemer (Santa Fe, NM), Josko Silobrcic (Swampscott, MA)
Primary Examiner: Hung T Vy
Application Number: 13/540,295
Classifications
Current U.S. Class: Preparing Data For Information Retrieval (707/736); Generating An Index (707/741)
International Classification: G06F 17/30 (20060101); G06F 7/00 (20060101);