RAPID SYNDROME ANALYSIS APPARATUS AND METHOD
In a computer; a computer readable medium for providing rapid syndrome analysis includes instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data in response to an observed value. It also includes instructions for comparing the CUSUM data with a predetermined plurality of threshold values and instructions for generating a signal when the CUMSUM data exceeds any of the plurality of threshold values by a predetermined amount.
Latest BECTON DICKINSON AND COMPANY Patents:
This application is a divisional of U.S. application Ser. No. 11/089,660, filed on Mar. 24, 2005, the disclosure of which is incorporated herein by reference.
FIELD OF INVENTIONThis invention relates to a rapid syndrome analysis apparatus and method. In particular, according to one embodiment, the invention relates, in a computer, to a computer readable medium for providing rapid syndrome analysis including instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data in response to an observed value. Further, instructions are providing for comparing the CUSUM data with a predetermined plurality of threshold values. Finally, instructions are provided for generating a signal when the CUSUM data exceeds any of the plurality of threshold values by a predetermined amount.
BACKGROUND OF THE INVENTIONMathematical algorithms have been used to identify disease clusters and are a key component of syndromic surveillance software used to monitor for bioterrorism, as well as naturally occurring disease outbreaks. Examples of such algorithms are wide spread. One study of the use of algorithms of which Applicant is aware was made by DARPA. (See e.g. DARPA Bio-ALIRT Program Technical Report, “Evaluation of Algorithms for Outbreak Detection Using Clinical Data from Five U.S. Cities” (Oct. 15, 2004).
With particular relevance to this invention, a statistical method for quality improvement in industry is known as the “Fast Initial Response Cumulative Sum” (FIR CUSUM) method. This method is well known and is explained in detail, for example, in Ryan, T. P., “Statistical Methods for Quality improvement” John Wiley & Sons, New York (1989) pp. 110-112.
The standard FIR CUSUM test statistic calculates the deviations q from the moving mean (Xbar) on each day (t). The value St(q) accumulates these deviations from the mean (qt) but only if they exceeded the mean by a threshold value (k). When the accumulated sum of deviations exceed a preset limit h, a “signal” is generated, and the sum, St, is reset to a starting value, Sreset, for analyses continuing on succeeding days. All deviations (qt), thresholds (k) and limits (h) are expressed in standardized units (sigmas).
So=0
St(q)=Max[0,St-1+qt−k]
h=threshold value
If St(q)>h then reset to St(q)=Sreset
(xt−xbar)
where qt=- - -
-
- (std. deviation)
Attempts have been made to apply the standard FIR CUSUM procedure to the problem of detecting and rapidly identifying the onset and outbreak of diseases. Nonetheless, there are many limitations of this standard FIR CUSUM procedure when applied to syndromatic surveillance data. By way of example only and not by limitation, some weaknesses of the FIR CUSUM are listed as follows:
1) the user or programmer must specify an “interval width”, period of time for calculating the moving mean and standard deviation. The user must also specify the three model parameters (h, k, and Sreset). However, the parameter values with the “best” sensitivity and specificity depend on the type, shape, amplitude and duration of the outbreak on wishes to detect. These characteristics of outbreaks are different for each disease and depend on the mode of exposure, the magnitude of exposure, the location of exposure, and numerous other variables that cannot be known in advance.
2) The known FIR CUSUM procedure weights all data in the moving average window equally, making no allowance for the natural weekly periodicity seen in many healthcare settings.
3) The mean & standard deviation of the known FIR CUSUM procedure are heavily influenced by outliers, including zero values (in practice, zeros frequently represent missing data).
4) The known FIR CUSUM procedure cannot quantify the “unusualness” or public health importance of a signal or flag. An outbreak involving 10 persons ill generates the same signal as one involving 200 people.
5) The known FIR CUSUM procedure does not quantify the duration of a signal or flag. In fact, it “resets” after every out-of-control signal is generated, so consecutive days with unusual values are frequently missed.
The FIR CUSUM method is not the only statistical method that exhibits these weaknesses when applied to real-world healthcare syndromes. Most of the available statistical methods that have been applied to outbreak detection were adapted from engineering and quality control applications, and have serious deficiencies in terms of sensitivity and specifically. Sensitivity is the proportion of true positives (true outbreaks detected). Specificity is the proportion of true negatives (false outbreaks not flagged). To maximize sensitivity and specificity, these algorithms require a long “training period” (a year or more of baseline data to “learn” what kinds of peaks comprise true positives and true negatives). This requirement for a large baseline set of data is often impractical because:
1) Systems to collect syndromic surveillance data are only now being developed.
2) There are many situations where it is impossible to obtain comparable data. For example, the Olympic Games cause a large influx of people into a small geographic area for a limited period of time, so no comparable “baseline data” exist.
3) These algorithms are tuned to detect outbreaks similar to those that have occurred in the past. Bioterrorism attacks may produce outbreaks unlike any we have seen in the past.
Also, the statistical methods proposed for outbreak detection are designed to detect only the beginning of a signal, and provide no subsequent information about the amplitude, shape or duration of the signal. Because these systems, including the FIR CUSUM, were developed for manufacturing and quality control settings, they assume that there is an easy way to confirm or identify the signal once it is detected. In disease surveillance, this assumption does not hold. There is rarely any easy way to confirm or identify the cause of any particular cluster, and the epidemiologist needs ongoing daily information about the amplitude, shape and duration of the outbreak to mount a proper investigation.
Thus, there is a need in the art for an apparatus and method for rapidly analyzing health related data that does not require a long “training period”, that signals the outbreak of an illness quickly and that provides the user detailed information about the amplitude, shape and duration of the outbreak. It therefore is an object of the invention to provide an apparatus and method for providing rapid syndrome analysis that is easy to use and interpret and that is flexible and scalable.
SUMMARY OF THE INVENTIONAccordingly, the rapid syndrome analysis apparatus and method, according to one aspect of the invention, includes, in a computer, a computer readable medium with instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data in response to an observed value. Instructions are also provided for comparing the CUSUM data with a predetermined plurality of threshold values. Further, instructions for generating a signal are provided when the CUSUM data exceeds any of the plurality of threshold values by a predetermined amount.
According to a further aspect of the invention, the computer readable medium includes instructions for ignoring CUSUM data which previously resulted in the generation of a signal. In a further aspect, the computer readable medium includes instructions for measuring a plurality of CUSUMs simultaneously wherein the instructions include instructions for measuring data from a group of time blocks measured from a current date. According to another aspect of this embodiment, the group of time blocks consists of seven days of data, fourteen days of data, twenty-one days of data and twenty-eight days of data measured from the current date. According to a further aspect of this invention, the instructions for measuring a plurality of CUSUMs simultaneously comprise instructions for measuring four separate CUSUMs for each of group of time blocks.
According to a further aspect of the invention, the computer readable medium with instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data includes instructions for creating a moving baseline of CUSUM data with a declining weight given to older data. In accordance with another aspect, the instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data includes for ordering the CUSUM data for a current date such that days that are the same day of the week as the current date are given more weight. In another aspect of the invention, the instructions for comparing the CUSUM data with a predetermined plurality of threshold values includes instructions four separate threshold values increasing from lowest to highest.
According to a further aspect of the invention, instructions for generating a signal include instructions for generating a “trimmed weighted” mean and sigma. According to a further aspect, the instructions for generating a trimmed weighted mean and sigma include instructions for excluding certain zero values. According to another aspect, the invention includes instructions for calculating the duration of an event identified by one or more signals. According to a further aspect, instructions are provided for estimating the number of people ill with regard to an event identified by one or more signals. According to a further aspect of the invention, instructions are provided for generating data on the amplitude shape and duration of an event identified by a signal.
In another embodiment of the invention, in a computer, a computer readable medium for providing rapid syndrome analysis includes instructions for measuring a plurality of CUSUMs simultaneously and generating CUSUM data with a moving baseline over a set period in response to an observed value. Instructions are also provided for comparing the CUSUM data with a predetermined plurality of threshold values. Further, instructions are provided for generating a signal when the CUSUM data exceeds any of the plurality of threshold values by a predetermined amount and, instructions are provided for generating a trimmed mean and sigma such that CUSUM data which previously resulted in the generation of a signal is excluded.
In another aspect of the invention, the instructions for measuring the plurality of CUSUMs simultaneously include instructions for measuring data from a group of time blocks measured from a current date wherein the group of time blocks consists of seven days of data, fourteen days of data, twenty-one days of data and twenty-eight days of data measured from the current date. In another aspect of the invention, the instructions for generating a trimmed mean and sigma include instructions for excluding certain zero values.
In another embodiment of the invention, a method for providing rapid syndrome analysis includes the steps of measuring a plurality of CUSUMs simultaneously and generating CUSUM data with a moving baseline over a set period in response to an observed value. Next the CUSUM data is compared with a predetermined plurality of threshold values and a signal is generated when the CUSUM data exceeds any of the plurality of threshold values by predetermined amount. Finally, a trimmed weighted mean and sigma are generated such that CUSUM data which previously resulted in the generation of a signal is excluded.
In another aspect of this invention, the step of measuring a plurality of CUSUMs simultaneously includes the step of measuring data from a group of time blocks measured from the current data wherein the group of time blocks consists of seven days of data, fourteen days of data, twenty-one days of data and twenty-eight days of data measured from the current date. In another aspect of the invention, the step of generating a trimmed mean and sigma includes the step of excluding certain zero values.
In another aspect of the invention, the instructions for generating flags include instructions for flagging consecutive days so that ongoing outbreaks continue to generate daily flags until the number of observed cases begins to decline.
Other objects, features and advantages of the present invention will become more fully apparent from the following detailed description of the preferred embodiments and the accompanying drawings in which:
The preferred embodiment of the present invention is illustrated by way of example in
Referring now to
Further there are four different values of H 32, K 34 and reset value 36 for each of the four time block intervals 30. Referring to
Row 44 is the second set of parameters for the seven day interval 30. Here, H 32 is “4.0”, K 34 is “2.50” and the reset value 36 is “2.0”. Should observed values result in a number equal to or greater than four times K 34 in row 44, a signal 40 is generated. In this case the signal 40 is a Yellow flag 46.
Rows 48 and 50 are the third and fourth set of parameters for the seven day interval 30. An orange flag 52 and a Red flag 54 is generated when the observed data results in a number the same or greater than the H 32 values on those rows.
As can be appreciated, Applicant's invention, according to one embodiment, provides for the simultaneous measurement of multiple CUSUMs. These multiple CUSUMs are compared to a plurality of threshold values. This is an important aspect of the invention in that it enables the user to identify outbreaks of a wide variety of unknown types quickly and repeatedly as will be discussed more fully hereafter.
In another embodiment, as can be seen from
The formula for the cumulative sums (CUSUMs) according to the present invention is:
S(t)=S(t−1)+Q(t)+(K)
where
S(t) is the CUSUM of the current day and
S(t−1) is the CUSUM from the day before the current day;
Q(t)=(X−meanweighted)/(Sigmaweighted)
where
X=the observed value for the current day (t).
Meanweighted=weighted average of included interval values
Sigmaweighted—weighted standard deviation of included interval values;
K=threshold as adjusted as set forth in
When S(t) is greater than H 32, the appropriate signal 40 color flag is generated and the reset value 36 is used as S(t−1). For an S(t) that is negative, the reset value 36 is zero.
As used herein, the term “CUSUM data” includes all the elements, terms and results of these formulas as utilized by the invention set forth herein.
Applicant's rapid syndrome analysis apparatus, according to one embodiment, uses intervals 30 that are multiples of seven days and according to a further embodiment, it weighs each day in a moving baseline window differently in order to minimize the effects of weekly periodicity. Applicant's invention uses no ‘guard band,’ which means the baseline interval begins with the day immediately prior to the “Test Date”. Each day in this interval is given a linearly declining weight in the following order: Days that are the same day of the week as the “Test Date” are weighted most heavily, followed by all remaining values (most recent to least recent). For example, if the test date is Monday, then a fourteen day interval 30 moving average is calculated using the weights Last Monday=14, 2-weeks-ago Monday=13, Yesterday (Sunday)=12, Day Before Yesterday (Saturday)=11, Last Friday=10, Last Thursday=9, Last Wednesday=8, Last Tuesday=7, 2-week ago Sunday=6, 2-weeks-ago Saturday=5, 2-weeks-ago Friday=4, 2-weeks-ago Thursday=3, 2-weeks-ago Wednesday=2, 2-weeks-ago Tuesday=1.
Further, according to one embodiment, a “trimmed weighted” mean and standard deviation are calculated to remove the influence of outliers on these statistics as follows:
a. Extreme values are excluded. “Extreme” is defined as the highest and lowest (100/Interval) percent of values. For a seven day interval 30, this is the highest and lowest 14% of values. For a twenty-eight day interval 30, this is the highest and lowest 3.5% of values; and/or
b. Only non-flagged days prior to the current date are included. That is, the Baseline interval 30 is extended back from the “Test Date” until it includes the required seven, fourteen, twenty-one, or twenty-eight non-flagged values.
c. Zero values are excluded from the calculations of the mean but they are included in the standard deviation unless the value zero is within 4 sigmas of the last calculated moving weighted mean. This is a very important and non-obvious improvement of the Applicant's invention over the prior art.
Still further, once a signal (flag) 40 is generated, consecutive daily values of equal or higher value continue to generate daily flags 40. Daily signals 40 continue to be generated as long as daily values stay above the mean. This is because, regardless of the value of any CUSUM, if a daily value is greater than the moving mean, then flags are only allowed to decline by a certain value (V) each day. V is calculated by the following formula:
V=Integer(Xt-1/Xt)
where Xt=today's observed value, and
Xt-1=yesterday's observed value
As a result, Applicant's invention enables users to determine the duration of an outbreak which is another important and non-obvious improvement of the Applicant's invention over the prior art. The duration of the current outbreak is estimated to help the user appreciate its significance. The “duration” is calculated in accordance with Applicant's invention as:
duration=# of consecutive non-zero flagged days.
In accordance with Applicant's invention the “unusualness” or “seriousness” of the outbreak is also easily capable of determination. The “unusualness” of a signal is identified by color coding. In order of increasing “unusualness”, flags have increasing weights (fblue)=1, (fyellow)=2, (forange)=3, and (fred)=4.
Still further, for count data, it is also possible with Applicant's invention to estimate the number of people ill in the current outbreak so as to further help the user appreciate its significance. The “number ill” is calculated in accordance with Applicant's invention as follows:
Number illestimated=(0.5)(Xbar-max)(Sigmamax)(fW)(duration)
-
- where Xbar-max=weighted mean of interval producing a highest flag
- sigmamax=weighted standard deviation of interval producing the highest flag
- fw=weight of highest flag
Referring now to
Referring now to
The table includes the following columns: Flag 56, color 58, interval 30, observed date 60, observed value 62, weighted moving average 64, weighted moving sigma 66, S(t−1) 68, Q 70, K 34, Adjust 72, H 32 and FIR CUSUM reset value 36. The operation of the formulas for determining these separate elements is as set forth above.
Referring to
An observed value 62 of 7 raises the blue color 58, for example, to a CUSUM S(t) of 2.8520 which is not greater than the H 32 value of 3 but it is a positive number that becomes the new S(t−1) 68 for the next day April 2.
Even though the observed value 62 decreases to 6 on April 2, it is sufficient in accordance with Applicant's invention, however, to result in a CUSUM S(t) 74 of 3.7315 which is greater than the blue color requirement of 3 for H 32. Thus a color 58 blue flag 56 is shown in the table.
By way of further explanation, the CUSUM S(t) 74 is the result of the addition of the previous day's CUSUM or reset value S(t−1) 68 plus Q 70 plus K 34 plus adjust 72. Adjust 72 in the Table in
Continuing with the explanation, an observed value 62 on April 3 of 7, now results in both a blue and yellow color 58 flag.
Referring to
Continuing now with reference to
April 22-24 Applicant's invention detects another subtle outbreak, with 9 and 10 cases per day, and on April 28-May there is a 2nd outbreak superimposed on this outbreak. Importantly, the present invention detects the second outbreak because it discards flagged values from the calculation of trimmed moving weighted means and sigmas, and marks the second outbreak with an Orange flag.
A huge outbreak occurs from May 11-May 28, which shows that the invention continues to flag every day of an ongoing outbreak (unlike the regular prior art CUSUM which only flags the onset, and then resets to 0). Note also that even though there is a “dropped” or missing data point in the middle of this huge outbreak (May 19), the apparatus and method 10 of the present invention quickly recognizes that this outbreak has not ended —giving a red flag 3 days later on May 22.
On June 1 Applicant's invention detects a one-day outbreak that occurs during the tail of the previous outbreak. These are very difficult to detect by any method other than the present invention because moving means by definition lag several to many days behind the day being tested. Therefore, with previous detection algorithms, the large mean and sigma from the preceding outbreak typically will mask any outbreak occurring in the 7-14 days following its peak. See for example the same data as analyzed by Prior Art systems shown in
In June and July, it is apparent that the baseline (or background level) of cases has risen and the trimmed moving weighted mean varies between 7.5 and 8.0 cases per day. The present invention adjusts for this increased baseline and does not flag outbreaks at this level. Note that this level is higher than the outbreak level (6.5) detected in early April and shows that the present invention is still sensitive to a changing baseline. Nonetheless, it does pick up the outbreak on June 22-27 it also recognizes 1-day outbreaks on July 3 and July 7.
A large outbreak occurs from July 10-15 and is flagged with orange and red flags. Although its amplitude is not very high (13 cases on July 11), its duration is rather long (6 days), and it affects a large number of patients. This is why Applicant's invention generates a red flag.
The last outbreak on July 25-26 has a higher amplitude but only lasts for 2 days, so it generates 2 blue flags. The examples in July show the importance of using trimmed weighted means and sigmas. The wild fluctuations in the number of cases per day, including 3 days with zeroes (missing data) would cause the untrimmed mean and sigma to be completely unreliable. See for example and comparison again, the Prior Art system shown
Because it uses a plurality of CUSUMs, according to one embodiment, sixteen different combinations of H, K and reset values, the rapid syndrome analysis apparatus and method 10 is not limited to detecting certain specific diseases. In fact, Applicant currently monitors the eleven different syndromes listed below. Note that two of these (neuro-toxic and radiation sickness) are not related to infectious diseases, but rather to chemical toxins or radiation threats.
1) Flu-like syndrome
2) Pox-like Rash syndrome
3) Pulmonary syndrome
4) Gastrointestinal syndrome
5) Hepatitis-like syndrome
6) Neuro-Toxic syndrome
7) Encephalitis-like syndrome
8) Systemic-illness syndrome
9) Sepsis-DIC syndrome (DIC=disseminated intravascular coagulation)
10) Radiation Sickness syndrome
11) SARS-like syndrome
Applicant's invention is of practical importance because there is a need for a mathematical technique, for example only, to identify disease outbreaks in daily healthcare surveillance data. The types of data that could be analyzed using this algorithm include daily counts of:
1) Symptoms of patients visiting hospital emergency departments
2) Over-the-counter medications purchased at pharmacies
3) Absences from school due to illness
4) Laboratory cultures of infectious pathogens
5) Cases of unusual or serious disease among humans reported to the health department
6) Cases of unusual or serious disease among animals reported to veterinarians
Obviously, any desired observable data can be utilized as the “daily count” for the purpose of the invention. In that regard, the description of the present embodiments of the invention have been presented for the purposes of illustration but are not intended to be exhaustive or to limit the invention to the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. As such, while the present inventions has been disclosed in connection with the preferred embodiment thereof, it should be understood that there may be other embodiments which fall within the spirit and scope of the invention as defined by the following claims.
Claims
1. A rapid syndrome analysis system comprising:
- a plurality of computer devices located at different geographic locations, wherein each of the plurality of computer devices is connected to a network;
- a first computer connected to the network, wherein the first computer comprises a first computer readable medium with instructions for: receiving syndromatic surveillance count data for a plurality of past time blocks and a current time block from the plurality of computer devices over the network; generating for each of a plurality of moving baseline intervals, a trimmed moving weighted means and sigma of the received syndromatic surveillance count data for the past time blocks within that moving baseline interval; generating a plurality of cumulative sums (CUSUMs) using (1) the received syndromatic surveillance count data for the plurality of past time blocks and the current time block and (2) the generated trimmed moving weighted means and sigmas; and generating one or more flag signals when a magnitude of one or more of the generated CUSUMs exceeds a magnitude of one or more predetermined threshold values within a first set of threshold values stored on the first computer readable medium; and generating an indication of a seriousness or an unusualness of a disease outbreak based on the one or more generated flag signals; and
- a second computer connected to the network, wherein the second computer comprises a second computer readable medium with instructions for: receiving the indication of a seriousness or an unusualness of a disease outbreak from the first computer over the network; and providing a user with a visual cue corresponding to the indication of a seriousness or an unusualness of a disease outbreak, wherein the visual cue is selected from a finite set of visual cues, and wherein each visual cue within the finite set of visual cues is associated with a different weight of seriousness or unusualness of a disease outbreak
2. The system of claim 1, wherein the plurality of cumulative sums (CUSUMs) is generated simultaneously by the first computer.
3. The system of claim 1, wherein each of the CUSUMs are generated using at least one of a different moving baseline interval or a different predetermined threshold value within a second set of threshold values stored on the first computer readable medium.
4. The system of claim 1, wherein the visual cue is a color selected from a finite set of colors, and wherein each color within the finite set of colors is associated with a different weight of seriousness or unusualness of a disease outbreak.
5. The system of claim 4, wherein the finite set of colors includes blue, yellow, orange, and red, and wherein the colors blue, yellow, orange, and red provide, respectively, indications of increasing unusualness or seriousness of a disease outbreak.
6. The system of claim 1, wherein the plurality of past time blocks and the current block each correspond to a single day.
7. The system of claim 6, wherein the plurality of moving baseline intervals include a first baseline interval of seven days, a second baseline interval of fourteen days, a third baseline interval of twenty-one days, and a fourth baseline interval of twenty-eight days.
8. The system of claim 1, wherein the trimmed moving weighted means and sigmas are generated, in part, by excluding syndromatic surveillance count data for past time blocks for which a flag signal was previously generated.
9. The system of claim 1, wherein the trimmed moving weighted means and sigmas are generated, in part, by excluding zero values and a variable percentage of extreme values.
10. The system of claim 1, wherein the first computer readable medium further comprises instructions for generating a duration of the disease outbreak by determining an amount of past time blocks for which a flag signal was previously and consecutively generated, and wherein the second computer readable medium further comprises instructions for (1) receiving the duration of the disease outbreak from the first computer over the network and (2) providing the user with an indication of the duration of the disease outbreak.
11. The system of claim 1, wherein the received syndromatic surveillance count data for the past and current time blocks includes a plurality of counts of patients that visited each of the different geographic locations during those time blocks, and were experiencing the same syndrome.
12. The system of claim 11, wherein the different geographic locations include one or more hospitals or laboratories.
13. The system of claim 11, wherein the syndrome is a flu-like syndrome, a pox-like rash syndrome, a pulmonary syndrome, a gastrointestinal syndrome, a hepatitis-like syndrome, a neuro-toxic syndrome, an encephalitis-like syndrome, a systemic-illness syndrome, or a sepsis disseminated intravascular coagulation syndrome.
14. The system of claim 1, wherein the received syndromatic surveillance count data for the past and current time blocks includes a plurality of counts of a medication purchased at each of the different geographic locations during those time blocks.
15. The system of claim 14, wherein the different geographic locations include one or more pharmacies.
16. The system of claim 1, wherein the different geographic locations include one or more hospitals, pharmacies, schools, laboratories, health departments, or veterinary clinics.
Type: Application
Filed: Mar 3, 2021
Publication Date: Aug 26, 2021
Patent Grant number: 11868424
Applicant: BECTON DICKINSON AND COMPANY (Franklin Lakes, NJ)
Inventor: Tracy L. Gustafson (Mesquite, TX)
Application Number: 17/190,523