METHODS OF IMPROVING QUALITY CONTROL ACCURACY FOR MEDICAL IMAGES
The present invention relates to a method to improve quality control (QC) accuracy. The method comprises at least one input quality control (IQC) check and at least one output quality control (OQC) check for the image to be analyzed. The integrated IQC and OQC results may be mapped onto a pre-defined multi-dimensional space to evaluate the overall QC result.
This application claims the benefit of the U.S. Provisional Application No. 63/253,146 filed on Oct. 7, 2021, titled “ARTIFICIAL INTELLIGENCE DIAGNOSTIC WORKFLOW,” which is incorporated herein by reference at its entirety.
BACKGROUND OF THE INVENTION Field of the InventionThe present invention relates to a method for improving the robustness of artificial intelligence (AI), especially the AI's ability to distinguish errors from good-quality results during diagnostic workflow.
Description of Related ArtDisease diagnosis is an important part in medical field. Recent years, artificial intelligence (AI) has applied to many medical diagnostic systems to improve the efficiency and accuracy of diagnosis. Thanks to the development of neural network (NN) and deep learning (DL), an AI with desired functions can easily be trained with good performance. However, despite high accuracy, deep neural network often recognizes features different from which human vision does (arXiv:1312.6199, 2014 and arXiv:1412.1897, 2015). Also, many studies have indicated that the confidence scores are not necessarily related to probability of correctness. In other words, AI has no “feeling” of whether the result it generated is reasonable or not (arXiv:1805.11783, 2018). Therefore, unexpected errors sometimes occur even in well-trained AIs.
For AI to process medical images, object detection and classification are the main focuses. Because AI acts differently from human mental process, and because it is nearly impossible to set a decision boundary without false positives and false negatives for a classifier, diagnostic AIs occasionally produce erroneous results which are difficult to figure out where the errors come from.
In an AI performing bone image classification and disease diagnosis, an X-ray image is usually provided as an input for analysis. The image then undergoes quality check, classification, and analysis by suitable AI model(s) according to its classification, as shown in
Therefore, a new method to improve the robustness of AI, especially the AI's ability to distinguish erroneous inputs from good-quality inputs during diagnostic workflow, is still desired.
SUMMARY OF THE INVENTIONTo resolve the problems, the present invention provides a method to accurately perform a quality check on an input image to distinguish bad-quality inputs from good-quality inputs. Besides picking out bad-quality inputs, the present invention also provides evaluations to indicate where the bad quality originated from.
The method disclosed herein comprises: performing a plurality of quality checks on an input image by algorithms or AI models, wherein the plurality of quality checks comprises at least one input quality control (IQC) check and at least one output quality control (QOC) check; providing a pre-defined multiple-dimensional space (MDS) according to the number and type of the quality checks, wherein the dimension number of MDS is equal to or larger than the number of the quality checks; mapping the result of the quality check onto a corresponding position of the pre-defined space; and determining a QC result based on the information of the corresponding position on the pre-defined space.
In one embodiment, the IQC check comprises a view check, a region of interest (ROI) check, and a compatibility check.
In one embodiment, the OQC check comprises a landmark check.
In one embodiment for knee osteoarthritis diagnosis, the OQC check further comprises a knee joint space width (JSW) check, and a femoral-tibial angle (FTA) check.
In one embodiment for osteoporosis diagnosis, the OQC check further comprises an ROI aspect ratio check, an age check, a femoral neck width (FNI) check, a cortical thickness index (CTI) check, a CTI ratio check, and a bone mineral density (BMD) check.
In one another embodiment for osteoporosis diagnosis, the OQC check further comprises an age check and a bone mineral density (BMD) check.
Other objectives, advantages and novel features of the invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.
The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is used in conjunction with a detailed description of certain specific embodiments of the technology. Certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be specifically defined as such in this Detailed Description section.
The embodiments introduced below can be implemented by programmable circuitry programmed or configured by software and/or firmware, or entirely by special-purpose circuitry, or in a combination of such forms. Such special-purpose circuitry (if any) can be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), etc.
The workflow for the present application is as shown in
In contrast to traditional implementations, the AI reporter generates an overall QC result by analyzing all previous QC, and thus the generated overall QC is based on many different factors which may span a multi-dimensional space. Each QC may represent one or more dimensions in the multidimensional space, depending on the data acquired by the QC. For example, in QC of knee joint images, 3 IQC (View Check, ROI Check and Compatibility Check) and 3 OQC (landmark check, joint space width check and femoral-tibial angle check) may be used to assess the overall quality of the input image. Since each of the above QCs generates a numerical/true-false result, a 6-dimensional space can be constructed based on the above QC analyses. The constructed 6-dimensional space thus is able to distinguish good-quality images from bad-quality images with higher sensitivity and specificity compared to dichotomic workflow as shown in
The first step is to check the image format, such as size, pixel value range, color format if applicable. In the case where a Digital Imaging and Communications in Medicine (DICOM) file is the input, the related DICOM tags, such as modality type and body part examined, may also be checked. In an embodiment, the DICOM tag of the input image is checked, and the analysis workflow is determined by the body part (e.g. knee, pelvis, spine) label in the DICOM tag. In an embodiment, the analysis is aborted if the DICOM tag is missing.
2. Algorithm IQCAlgorithm IQC, the step after system IQC, is to use AI image analysis/recognition algorithms to check the quality of the input image. The AI algorithms may be a basic classification model trained by pre-labeled images to distinguish between good images and bad images, or it may be a classification model for specific purposes, such as distinguishing which part the image belongs to, estimating the shooting angle of the image, determining whether artificial joints and implants exist or not.
Besides classification models, a regression model may also be used to provide quality scores. One possibility is to label different images with numerical values (quality scores) during AI training.
Another alternative method is to extract the results of network operations from a certain layer in the classification model as a quality score. For example, in the last layer of a multi-class classification model, a softmax function is usually used to normalize the possibility of each category, making it sum to one, and then select the category with the highest value as the classification prediction result. This softmax score or its derived calculations can be used as the quality score. Another possibility is to extract values from the layer output before calculating softmax normalization. These values are related to the degree of activation in the network for the image under specific categories, and can also be used to calculate the quality score.
An example for algorithm IQC is training a multi-class view classifier model, which is used to determine which part of body the image belongs to, the shooting angle of the image, left or right. In addition to outputting the classification results, it also outputs the softmax value as the quality score.
In the present application, the IQC check for all types of images comprises a view check, a region of interest (ROI) check and a compatibility check. The view check is performed by a view classifier model to determine which part of body the input image belongs to. It may be executed by an AI trained by a collection of images labeled with their corresponding categories. The ROI check performs similarly but it is a little different from the view check, as the view check finds the category the whole image, and the ROI check identifies the ROI block of a specific object (e.g. a knee joint or a hip) from the input image. The ROI check may be executed by another AI trained by a collection of images labeled with a specific object and the location of ROI within the images. The compatibility check is the determine whether the classification result from the view check and the ROI check coincide with each other. While it might be considered redundant to employ both view check and ROI check to classify the input image, the application of both checks increases the accuracy of overall QC, since the AI of the two checks learns differently. The probability that both view check and ROI check provide false result is small, thus the false prediction is minimized by checking whether the two checks predict the same result.
3. Model OQCThis stage is to check whether the output values of the models or algorithms are reasonable and whether there is any abnormality. This can be done by setting a normal acceptable range based on the distribution of training data, a reasonable range of physiological structural measurement, or an internal value generated by the model (e.g. the maximum heatmap predicted by the landmark model at each point).
In order to complete the work of image analysis and diagnosis, the models of all IQC and OQC may be disassembled into multiple models to perform different subtasks. These subtasks usually have a sequential relationship, but is not necessarily in some cases. For example, the first model may be the object detection model to find the region of interest (bounding box) of the knee joint from the entire image. The second model may find the edge of the knee joint bone or specific anatomical locations from the region of interest (ROI) according to the results of the first model. The third model may use these points to calculate the joint space width (JSW). And the fourth model may use the points to calculate the femoral-tibial angle (FTA). The fifth model may label the areas of medial tibial, lateral tibial, medial femoral, lateral femoral and other local small areas derived from the points of previous models, and send them into models for classification or characteristic value prediction for disease classification. Examples include joint space narrowing (JSN) in degenerative arthritis, osteophyte, sclerosis, cyst, osteoporosis, bone mineral density (BMD), vertebral compression fracture and so on. Additional models may collect all or some of the above results to make new predictions. For example, a Kellegren-Lawrence grade commonly used to evaluate the severity of osteoarthritis (OA) may be predicted by the results of joint space width JSW, JSN, osteophyte, sclerosis and FTA.
4. AI Reporter (Overall QC)Traditional QC processes performs a check at the output of each subtask and stop the whole process with an abnormal message if any abnormalities are found in between. In contrast, in the present application, after all models are executed, the output values of the model predictions and other auxiliary inference log (e.g. the softmax value, the deviation distance of the landmark point, etc.) are collected and used to make an integrated judgement as an overall QC result. An AI reporter integrating all predictions has several advantages, which are described below.
First, models with different functions may have misjudgments or are not robust enough to new data (such as unseen data or data from new domain). Judging the image quality based on the results of a single model has a high risk to be wrong. In contrast, different models will have a lower chance of misjudging the same data at the same time because of different task, training data, or model architecture. For example, the view classifier may misjudge the lateral view of a knee joint as a posterior-anterior view (PA view), but a PA knee object detector will not catch the object, or return a very low probability score of the bounding box, in the following step. It is almost impossible for two models to misjudge a lateral knee as PA knee at the same time.
Second, performing an integrated judgement after all the models are executed is more in line with the general process of AI debug. By doing this one can have a comprehensive overview, and is easier to find out which model has an error. Following the above example, the view classifier judges the input image as lateral view, but the object detector finds the ROI of PA view, and the landmark model shows that the prediction of some points has a large deviation compared to the average positions previously established. Combining those results together, it is very likely that the object detector gives the wrong ROI.
In the embodiments of the present application, a rule engine for the output of each model is constructed based on the previously collected test data. Some examples include whether view classifier compatible with DICOM tag, view probability, and landmark deviation. Different combinations of the output can be categorized as different groups or different positions/regions in a multi-dimensional space (MDS) for QC. The next step is to review pre-defined QC groups or QC MSD and refine the rule engine. The rule engine may be further optimized when new data comes in by iteration.
When the amount of image data is large enough, images with labeled QC tags (indicating the QC groups or the position in QC MDS) can also be use directly and combined with values generated by the models to train a QC engine AI model (meta model).
5. Displaying QC Results on User Interface (UI)Last step, an AI reporter displaying QC results will combine AI results and QC rule judgment, and send back a message to the system manager to determine what kind of result and QC warning message the user will receive on the UI.
EXAMPLESThe following examples are provided to further illustrate the image processing method as claimed.
1. AI Prediction Result for Images of Lumbar SpineAn AI is trained to distinguish more than 30 categories of X-ray bone image, each category represents a specific body part with a specific shooting method. After training, the AI is used to “predict” the categories of 1,650 input X-ray images, whose ground truth are all right lateral view of lumbar spine. The result in
An ROI algorithm is an object detector AI trained by a set of pre-labeled data. This trained ROI check algorithm is used to judge the images of knee joints. The results of four input images with PA view are shown in
The pipeline of an overall QC for knee joint images is shown in
The accepted image is then analyzed by a view check (view classifier), a ROI check (knee ROI), and a compatibility check as algorithm IQC for knee joint images. The result from the Knee ROI is further fed into OQC components to perform landmark check (knee keypoints), joint space width (JSW) check and femoral-tibial angle (FTA) check. The checks for quality control of knee joint images are shown in Table 1. The details of landmark check of knee joint are provided in Example 6 below. The JSW check is performed by an AI trained to determine the distance between the apex of the medial condyle of the femur and the posterior end of the tibia (which is named joint space width) of the knee joint image. The FTA check is performed by an AI trained to measure the angle between the femur and tibia (femoral-tibial angle) of the knee joint image.
At the overall QC stage (AI reporter), all the above results are collected and analyzed together with predefined rule sets. If none of the rules was fulfilled, the input file is considered QC pass and no warning messages will be displayed on the UI. If one or multiple QC rules were met, corresponding QC warning messages will be displayed on UI. The rule sets of knee joint QC are listed below in Table 2.
Three QC examples are given as below:
(a) An image incorrectly labeled as knee AP (anterior-poster) in DICOM tag:
1. The System IQC outputs DICOM tag “Protocol Name” as “Knees AP/PA (anterior-poster/poster-anterior) Standing”
2. the View Classifier outputs “Knee Lateral Right” with probability score >0.8,
3. the Knee ROI outputs “Knee AP/PA Right” with probability score <0.5, and
4. the Knee Keypoints outputs 5 locations with SD>2.
The AI reporter will determine this image is probably not a true knee AP image, then the UI shows a warning message of “The input image is unlikely a knee AP view.”
(b) A knee image with implant(s):
1. The System IQC outputs DICOM tag “Protocol Name” as “Knees AP Standing”
2. the View Classifier outputs “Knee AP/PA bilateral” with probability score >0.8,
3. the Knee ROI outputs “Knee AP/PA Right” with probability score <0.5, and
4. the Knee Keypoints outputs 2 locations with SD>2.
The AI reporter will determine this image is a knee AP/PA image probably with an implant or taken with non-standard imaging angle, then the UI shows a warning message of stating that the input image might contain implants or the image was taken with non-standard position.
(c) The image in
By using the method of the present application to analyze multiple modules at the same time, the knee finder which performs ROI check will find the frontal (PA) ROI of the right knee, although the score is only 0.77 and below the threshold value 0.8, and thus will trigger the QC flag warning an abnormality. The analysis of other modules still carries out, and the output of the landmark module shows that each landmark point is within the range of the average point, and this image passes the QC criterion for landmark check. Summing below information together, one can speculate that this image should be frontal and should have implants, because the score of knee ROI check is slightly lower than the threshold value.
4. Overall QC for Hip ImagesThe pipeline of an overall QC for hip images is shown in
The first step of the pipeline for QC of hip image is a system IQC which analyzes the DICOM tag of the input file to check whether it contains keywords such as “PELVIS”, “ABDOMEN”, etc.
In the next step, algorithm IQC, the first set of applied QC classifiers are similar to that of knee joint QC, i.e. a view check (classification of the whole image), an ROI check (finding the ROI and its category), and a compatibility check (to check whether the views determined by view check and ROI check are concordant). Using the results of the view check and its confidence value (the output value of one of the hidden layers), and the results of the ROI check (finding the ROI and its category) and confidence value, the input quality can be comprehensively judged, and a corresponding UI message can be provided to the user.
The third step, model OQC, comprises several QC methods which are specific for hip region. ROI aspect ratio check is to check whether the found ROI aspect ratio is within a specific range; hip landmark check is a landmark check to find landmark points, each point has its corresponding confidence value threshold to be satisfied to pass the QC; age check is to check whether an age information is attached on the DICOM file; FNW check is to determine whether the measured femoral neck width (FNW) is within a reasonable range; and CTI check is to determine whether the measured cortical thickness index (CTI) is within a reasonable range. The checks for quality control of hip images are as shown in Table 3. The details of landmark check of hip are provided in Example 7 below. The ROI aspect ratio check is the ratio of height to width for the ROI identified by the ROI check. The age check is to check the availability of age information on DICOM tag. The FNW check is performed by an AI trained to measure the width of femoral neck. The CTI check and the CTI ratio check are performed by AI trained to calculate the cortical thickness index and the ratio thereof, respectively.
An AI reporter then collects and analyzes all the results together and provides an overall QC result with predefined rule sets, as shown in Table 4.
The following is an example of the overall QC for a hip image. An image of hip region with an implant (as shown in
The above result is in line with the third criterion of the hip rule engine, and the overall QC may judge that there should be an artificial joint in the image. If the QC is performed step by step without integrating all results together, the image will be kicked off in the ROI check with only a QC failed message, instead of going further and finding out that there may be implants.
5. Overall QC for Spinal ImagesThe pipeline of an overall QC for hip images is shown in
An AI reporter then collects and analyzes all the results together and provides an overall QC result with predefined rule sets, as shown in Table 6. The passed QC checks may be designated as “true”, “1”, or “positive”, whereas the unpassed QC checks may be designated as “false”, “0”, or “negative” in the multi-dimensional space.
For each knee in the image, 74 keypoints are labeled. If those points are linked with a specified order, this set of points become a shape of the knee joint.
In training data, 1,500 labeled knees are used. Those 1,500 shapes (and thus all the keypoint sets) are aligned via rotation, scaling, shifting until the distance between each shape (each keypoint to the corresponding keypoints on other shapes) became minimum. The generated shape is called the average shape. In the domain of statistical shape model (SSM), this process is called Procrustes analysis. The 1,500 shapes after the Procrustes analysis provides a mean position and standard deviation (std) for each keypoint (i.e. each keypoint has its own mean position and standard deviation).
During the QC process, each predicted keypoint are checked to see whether its distance to the corresponding keypoint on the average shape exceeded a certain threshold (e.g. 2 standard deviation).
For example, keypoint #22 might have a mean position at (X22,Y22), and a standard deviation (calculated from all the keypoint #22 from the 1,500 shapes). At prediction phase, the model outputs the 74 keypoints. The QC process aligns these 74 points to the average shape obtained previously so that the distance between each point to its corresponding point on the average shape becomes minimum. The keypoint #22 after the alignment process has a coordinate of (Xp, Yp), and the QC process will calculates the distance (D22) between (Xp, Yp) and (X22,Y22). If D22 is larger than the predefined threshold (e.g. 2*std22), this keypoint #22 is considered abnormal.
The hip keypoint model outputs 36 keypoints (18 for each hip). Each keypoint prediction is coming from a heatmap in the last layers of the network. The predicted keypoint location is obtained from taking the location of the highest value on the heatmap (highest activation). However, in the case that the hip anatomical feature is not present (or blurred/obscured) in the input image, the heatmap will always have a highest value and thus gives nonsense prediction. The idea of the hip landmark QC process is similar to the classification model that if the activation (softmax, probability, score, etc.) is not high enough, it might be a false prediction. After analyzing the distribution of activation score for each keypoints in the training data, we can determine a set of thresholds for those 36 keypoints.
The hip keypoint model will output both the location of the keypoint as well as the corresponding scores. The QC process checks the score of each keypoint with the predefined thresholds. (e.g. keypoint #3 could have a threshold score of 0.4, while keypoint #10 could have a threshold of 0.8)
The foregoing description of embodiments is provided to enable any person skilled in the art to make and use the subject matter. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the novel principles and subject matter disclosed herein may be applied to other embodiments without the use of the innovative faculty. The claimed subject matter set forth in the claims is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. It is contemplated that additional embodiments are within the spirit and true scope of the disclosed subject matter. Thus, it is intended that the present invention covers modifications and variations that come within the scope of the appended claims and their equivalents.
Claims
1. A method to improve quality control (QC) accuracy, comprising:
- performing a plurality of quality checks on an input image by algorithms or AI models, wherein the plurality of quality checks comprises at least one input quality control (IQC) check and at least one output quality control (QOC) check;
- providing a pre-defined multiple-dimensional space (MDS) according to the number and type of the quality checks, wherein the dimension number of MDS is equal to or larger than the number of the quality checks;
- mapping the result of the quality check onto a corresponding position of the pre-defined space; and
- determining a QC result based on the information of the corresponding position on the pre-defined space.
2. The method of claim 1, wherein the IQC check comprises a view check, a region of interest (ROI) check, and a compatibility check.
3. The method of claim 2, wherein the OQC check comprises a landmark check.
4. The method of claim 3, wherein the input image is a skeletal image of the knee region, and the input image is for knee osteoarthritis diagnosis.
5. The method of claim 4, wherein the OQC check further comprises a knee joint space width (JSW) check, and a femoral-tibial angle (FTA) check.
6. The method of claim 5, wherein a first knee QC region is the region of the MSD whose quality checks are all positive.
7. The method of claim 5, wherein a second knee QC region is the region of the MSD that the ROI check, the compatibility check and the landmark check of which are all negative.
8. The method of claim 5, wherein a third knee QC region is the region of the MSD that the landmark check of which is positive, and at least one of the view check, the ROI check and the compatibility check of which is negative
9. The method of claim 5, wherein a fourth QC region is the region of the MDS that both the landmark check and the FTA check of which are positive, and the JSW check of which is negative.
10. The method of claim 5, wherein a fifth QC region is the region of the MDS that both the landmark check and the JSW check of which are positive, and the FTA check of which is negative.
11. The method of claim 3, wherein the input image is a skeletal image of the hip region, and the input image is for osteoporosis diagnosis.
12. The method of claim 11, wherein the OQC check further comprises an ROI aspect ratio check, an age check, a femoral neck width (FNI) check, a cortical thickness index (CTI) check, a CTI ratio check, and a bone mineral density (BMD) check.
13. The method of claim 12, wherein a first hip QC region is the region of the MSD whose quality checks are all positive.
14. The method of claim 12, wherein a second hip QC region is the region of the MSD that the view check of which is positive, and the compatibility check of which is negative.
15. The method of claim 12, wherein a third hip QC region is the region of the MSD that both the compatibility check and the landmark check of which are negative.
16. The method of claim 12, wherein a fourth hip QC region is the region of the MSD that at least one of the view check, the ROI check, the compatibility check and the landmark check of which is negative.
17. The method of claim 12, wherein a fifth hip QC region is the region of the MSD that the age check of which is negative.
18. The method of claim 12, wherein a sixth hip QC region is the region of the MSD that both the CTI check and CTI ratio check of which are negative.
19. The method of claim 12, wherein a seventh hip QC region is the region of the MSD that the FNI check of which is negative.
20. The method of claim 3, wherein the input image is a skeletal image of the spine region, and the input image is for osteoporosis diagnosis.
21. The method of claim 20, wherein the OQC check further comprises an age check and a bone mineral density (BMD) check.
22. The method of claim 21, wherein a first spine QC region is the region of the MSD whose quality checks are all positive.
23. The method of claim 21, wherein a second spine QC region is the region of the MSD that the view check of which is positive, and the compatibility check of which is negative.
24. The method of claim 21, wherein a third spine QC region is the region of the MSD that at least one of the view check, the ROI check, and the compatibility check of which is negative.
25. The method of claim 21, wherein a fourth spine QC region is the region of the MSD that at least one of the age check and the BMD check of which is negative.
Type: Application
Filed: Oct 7, 2022
Publication Date: Apr 13, 2023
Inventors: QING-ZONG TSENG (New Taipei City), CHENG-WEI LIN (New Taipei City), PO-YU CHEN (New Taipei City)
Application Number: 17/938,695