Image correction using an invertable network
For correction of an image from an imaging system, an inverse solution uses an imaging prior as a regularizer and a physics model of the imaging system. An invertible network is used as the deep-learnt generative model in the regularizer of the inverse solution with the physics model of the degradation behavior of the imaging system. The prior model based on the invertible network provides a closed-form expression of the prior probability, resulting in a more versatile or accurate probability prediction.
Latest Siemens Healthineers AG Patents:
- Method for determining an imaging parameter value for the control of a medical technology device during a capture of a first image dataset
- Virtual enhancement of a camera image
- Method for actuating a medical imaging device
- AUTOMATIC COLLIMATION ADAPTION FOR DYNAMIC X-RAY IMAGING
- AUTOMATIC OPTIMIZATION OF PARAMETERS OF AN IMAGE PROCESSING CHAIN
The present embodiments relate to image correction. Images often include artifacts. The image may be restored by reducing the artifacts. Image restoration tasks, such as denoising, deblurring, inpainting, or reconstruction are inverse problems in which a degradation operator was applied to the image. The original image is recovered by combining a physical model of the degradation behavior and a prior model of what a good image should be. Image restoration is typically cast as a Maximum A Posteriori (MAP) estimation problem, then solved using constrained optimization.
Prior approaches to solve for this inverse problem include manually crafting a prior model and an algorithm to provide a good compromise between the reconstruction quality and the complexity of the algorithm. These prior approaches suffer from simplistic assumptions to reduce the complexity of the manual prior and from the requirement for balancing this compromise that may be different for many different situations. Another approach uses supervised machine learning to output a corrected image given an input image. The machine-learning approach suffers from a lack of availability of a sufficiently large set of distortion and imaging system-specific training data and from poor generalization: if the distortion, then the mapping must be re-trained from scratch although the observed object is the same. Recently, data-driven methods that learn a direct mapping from input measurements to output images have become the state of the art in imaging. These methods rely on the availability of good quality target images, which might not be available where the artifact producing, target scan protocol is already the state of the art and the only source. Learning a direct mapping couples the prior model and acquisition protocol, which means the obtained network may have to be retrained if the acquisition protocol changes.
U.S. Pat. No. 10,387,765 proposes the use of a deep generative model to represent the image prior with a gradient descent solver to minimize the Maximum A Posteriori (MAP) imaging problem for image correction. The unsupervised training of the generative model is from the input data by global maximum likelihood. The generative model may suffer from the lack of available good quality target images and merely approximates the probability.
SUMMARYBy way of introduction, the preferred embodiments described below include methods, systems, instructions, and non-transitory computer readable media for correction of an image from an imaging system. An inverse solution uses an imaging prior as a regularizer and a physics model of the imaging system. An invertible network is used as the deep-learnt generative model in the regularizer of the inverse solution with the physics model of the degradation behavior of the imaging system. The prior model based on the invertible network provides a closed-form expression of the prior probability, resulting in a more versatile or accurate probability prediction.
In a first aspect, a method is provided for correction of an image from a medical scanner. The medical scanner acquires the image representing a patient. The image has a level of artifact due to the acquisition by the medical scanner. A machine determines a probability for the image with a deep generative machine-learnt model. The deep generative machine-learnt model is an invertible network. The machine minimizes the level of the artifact in the image. The minimizing is a function of a physics model and the probability. The physics model includes a characteristic specific to the medical scanner. The image output from the minimizing is transmitted. The image is of the patient and from the medical scanner with the level of the artifact minimized.
In various embodiments, the image is acquired as a computed tomography, magnetic resonance, ultrasound, positron emission tomography, or single photon emission computed tomography image as a two-dimensional representation of pixels or a three-dimensional set of voxels. The image may have corruption by noise artifact, blur artifact, inpainting artifact, reconstruction artifact, or combinations thereof.
The invertible network is a closed-form expression of a prior probability. Any invertible network may be used, such as a stack of atomic invertible blocks.
In one embodiment, the invertible network was trained with joint optimization of imaging and weights of the invertible network. The joint optimization may have been based, in part, on a probability density function of noise and/or a determinant of a Jacobian. In one approach, the joint optimization used one loss computed over the weights as learnable parameters of the invertible network and the imaging as reconstructed images. In another approach, the invertible network was trained with stochastic gradient descent over the weights and the imaging. These characteristics of the training form the invertible network, providing values of the weights different than if trained in a different way.
In one embodiment, the minimizing inversely solves with the probability as a regularizer term. The minimization may be by gradient descent with the physics model in a first term and a gradient of the probability in a second term. The physics model includes a distribution of the characteristic specific to the medical scanner. In other embodiments, the minimization determines a gradient direction for a next iteration in the minimizing. The gradient direction is determined from the deep generative machine-learnt model.
In another embodiment, the transmission of the image is to a display. Transmission to memory or over a network may be provided.
In a second aspect, a method is provided for reducing distortion in an image from an imaging system. The image of an object is optimized to have less of the distortion. The optimizing is with a gradient-based optimization including a regularizer from a log-likelihood output by a machine-learnt normalizing flow. The image as optimized is displayed.
In further embodiments, the optimizing is optimizing with a physics model encoding a transform of the imaging system. In other embodiments, the optimizing is optimizing with the machine-learnt normalizing flow being an invertible network of atomic invertible blocks. In yet other embodiments, the optimizing includes optimizing with the machine-learnt normalizing flow having been trained with joint optimization from reconstructed images and learnable parameters of the normalizing flow using a known probability distribution.
In a third aspect, a method is provided for correction of a first image from an imaging system. The imaging system acquires the first image having a distortion due to the acquisition by the imaging system. A machine determines a likelihood of the first image having the distortion with a machine-learnt invertible network. A corrected image from the first image is solved for using the likelihood and a transform for the imaging system. The corrected image is transmitted.
In one embodiment, the machine-learnt invertible network was trained with joint optimization of reconstruction images and weights of the invertible network.
The present invention is defined by the following claims, and nothing in this section should be taken as a limitation on those claims. Further aspects and advantages of the invention are discussed below in conjunction with the preferred embodiments and may be later claimed independently or in combination.
The components and the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.
U.S. Pat. No. 10,387,765, the disclosure of which is incorporated herein by reference, proposes the use of a deep generative model to represent the image prior, a gradient descent solver to minimize the Maximum A Posteriori (MAP) imaging problem, and the unsupervised training of the generative model from the input data by global maximum likelihood. The disclosed generative models include generative adversarial networks and variational auto-encoders. These models do not provided a closed-form expression of the prior probability to perform the unsupervised training.
The generative model of U.S. Pat. No. 10,387,765 may be replaced with a normalizing flow, or invertible network, as the prior model. This invertible network provides a closed-form expression of the prior probability for unsupervised training. By avoiding approximation in the problem of previous generative models in the inverse solution, the resulting invertible network may better indicate the probability of the image prior. A greater range of training images may be used, avoiding limitations due to the current state-of-the-art protocol producing some images with artifacts.
In the discussion below, the teachings of U.S. Pat. No. 10,387,765 are provided. The use of the invertible network as an alternative generative model is included at the end of the discussions of acts 32 and 34.
Deep generative models directly evaluate the probability of a new image to belong to the same category as the training data. For example, deep generative models may predict whether a photograph is of a bird or not. While those models have shown their ability to synthesize natural-looking images, such as of a bird, the generative models have seen little use in other tasks due to their complexity, which does not allow easy manipulation by an inference or optimization algorithm.
The deep generative model may be used for image correction. The inverse problem is solved using deep priors. To recover an underlying image X given corrupted measurements Y, a deep generative model is integrated into the inverse problem framework rather than to replace the framework. The generative model takes an image as its input and returns the prior log-likelihood of the image. This log-likelihood is used as a regularization term in the inverse problem. Solving the inverse problem reduces the distortions, reconstructing the more ideal or corrected image X from the corrupted measurements Y.
Additional, different, or fewer acts may be provided. For example, act 30 is replaced with loading a medical image or other type of image. As another example, act 38 is not performed. In yet another example, act 36 is not performed.
The acts are performed in the order shown (top to bottom or numerical), but other orders may be used. For example, acts 32 and 36 are performed as part of act 34.
The acts are performed by the system of
In act 30, a medical scanner acquires an image representing a patient. The image is made available by or within the medical scanner. Acquisition may be through transmission over a network or from a memory. A processor may extract the data from a picture archive communications system (PACS) or a medical records database. Alternatively, data not in the medical environment is acquired, such as capturing or loading a photograph or video. Other sensors (e.g., acoustic transducers or camera) may generate an image in alternative embodiments.
The image is medical imaging data. The medical image is a frame of data representing the patient. The data may be in any format. While the terms image and imaging are used, the image or imaging data may be in a format prior to actual display of the image. For example, the medical image may be a plurality of scalar values representing different locations in a Cartesian or polar coordinate format different than a display format. As another example, the medical image may be a plurality red, green, blue (e.g., RGB) values output to a display for generating the image in the display format. The medical image may be currently or previously displayed image in the display or another format. The image or imaging is a dataset that may be used for imaging, such as scan data representing the patient. The scan data represents measurements Y from a specific medical scanner or imaging system.
Any type of medical image and corresponding medical scanner may be used. In one embodiment, the medical image is a computed tomography (CT) image acquired with a CT system. For example, a chest CT dataset may be used for detecting a bronchial tree, fissures, and/or vessels in the lung. For CT, the raw data from the detector is reconstructed into a three-dimensional representation. As another example, magnetic resonance (MR) data representing a patient is acquired. MR data is acquired with an MR system. The data is acquired using a pulse sequence for scanning a patient. Data representing an interior region of a patient is acquired. For MR, the magnetic resonance data is k-space data. Fourier analysis is performed to reconstruct the data from the k-space into a three-dimensional object or image space. The data may be ultrasound data. Beamformers and a transducer array scan a patient acoustically. The polar coordinate data is detected and processed into ultrasound data representing the patient. The data may be positron emission tomography (PET), single photon emission computed tomography (SPECT), or other nuclear imaging data. Radioactive emissions from within the patient are detected and reconstructed into imaging data.
The medical image represents tissue and/or bone structure of the patient. Alternatively, the medical image represents flow, velocity, or fluids within the patient. In other embodiments, the medical image represents both flow and structure. For PET and SPECT, the scan data represents function of the tissue, such as uptake.
The medical image represents a one, two, or three-dimensional region of the patient. For example, the medical image represents an area or slice of the patient as pixel values. A three-dimensional volume may be represented as pixel values by rendering to a two-dimensional format. As another example, the medical image represents a volume or three-dimensional distribution of voxels. Values are provided for each of multiple locations distributed in two or three dimensions. The medical image is acquired as a frame of data. The frame of data represents the scan region at a given time or period. The dataset may represent the area or volume over time, such as providing a 4D representation of the patient.
The image may include one or more artifacts or distortions. Different modalities of imaging are susceptible to different types of artifacts or corruption. The physics for scanning and/or the processing to create the image (e.g., measurements) from the scan may generate an artifact. Motion of the patient or sensor performing the scan may generate an artifact. Example artifacts in medical imaging include noise, blur (e.g., motion artifact), shading (e.g., blockage or interference with sensing), missing information (e.g., missing pixels or voxels in inpainting due to removal of information or masking), reconstruction (e.g., degradation in the measurement domain), and/or under-sampling artifacts (e.g., under-sampling due to compressed sensing). Other artifacts may be in the image.
Any level of artifact may exist. The scan settings for the medical scanner, condition of the patient, amount of movement, filtering, reconstruction, other image processing, and/or other factors may contribute to different levels of artifacts in an image. One image may include one type or multiple types of artifacts. The level may be a function of the severity (e.g., intensity or contrast) and/or extent (e.g., distribution or number of instances).
The image for a given patient is to be corrected. For example, the correction is for CT, X-ray and MR denoising, MR and ultrasound reconstruction, or MR super-resolution.
The correction uses a probability from a generative model to model what a good image is. In act 32, a machine determines a probability of artifact abnormality for the image with a deep generative machine-learnt model. Any machine capable of applying the deep generative machine-learnt model may be used. For example, a computer inputs the image to learned matrices or a matrix learned as the deep generative machine-learnt model. The generative model encodes training data to a few independent latent variables or learned features and may generate synthetic data by sampling the latent variables. For the inverse solution to correct image distortion, the generation of synthetic data is not used. The learnt model takes an image as input, but other inputs may be provided, such as clinical data of a patient. The generative model is implemented as a piecewise-differentiable function, such as used in deep learning, and returns a prior log-likelihood.
Any machine-learnt generative model may be used. For example, the generative model is a deep learnt model using Restricted Boltzmann Machines (RBM), Deep Belief Network (DBN), Neural Autoregressive Density Estimators (NADE), Variational Auto-Encoders (VAE), Denoising Auto-Encoders (DAE), Generative Adversarial Networks (GAN), extensions thereof, or other deep learning approaches for generative modeling. In one embodiment, the trained deep generative model is a deep neural network with a set of j convolutional layers and k fully connected layers, each followed by a non-linear activation function, and a set of pooling layers for features reduction. Other layer arrangements may be used.
In deep learning, the latent variables are learned by machine training. The machine learns the generative model from training images of a database. In one embodiment, training is performed using standard generative training. The training images from the database are of similar quality, such as a desired good quality that the generative model will represent. The level of artifact or quality score is above or below a threshold level, depending on whether higher or lower scores indicate better quality. Subjective selection may alternatively or additionally be used. All the images used for training the generative model are of good or the top image quality. Any threshold for quality may be used for the training images, such as only a score of 5 in a score range of 1-5 where 5 is the best quality. In alternative embodiments, a broader range (e.g., moderate level, low level, and/or no artifacts in the images) is used. In yet other embodiments, images of any quality level are used.
The training images are subject to no or little distortion or artifacts. Rather than finding a large number of images associated with a particular artifact, a large number of images with no or little distortion is found. Locating better quality images may be more easily performed than locating images with a particular artifact, as images with artifacts tend to not be stored or recorded. To increase the number of available images of sufficient quality, images corrected using other algorithms may be used. Any number of different approaches are provided for correcting for particular sources of corruption. By applying these approaches, images with reduced artifact or distortion are provided. The images from multiple different types of artifact correction may provide training images. This holistic data acquisition process results in a larger quantity of training data rather than discriminative approaches that may train each network only on images featuring one kind of distortion.
To train the generative model, the model is fed with a set of images of good or similar quality (e.g., determined by their score). The log-likelihood of the output is maximized. Since the output is a probability distribution (i.e.: constrained to sum to 1 over all images), training also implicitly minimizes the likelihood of images not present in the training set, such as low-quality images. The generative model encodes features that represent the good qualities in the images. Since generative training is unsupervised, training does not require matched pairs of good and bad images, which are difficult to acquire on a large scale in a medical setting. To obtain both good and bad images requires scanning patients twice and results in extra dose and/or scan time for with no immediate benefit to the patient.
Since the generative model is used to correct poor quality images, then those poor images may also be included in the training of the generative model assuming the correction is also part of the training. In general, the training is finding the model parameters that minimize the expectation of some loss over the data. One example minimization of the expectation is represented as:
where θ is the set of trainable parameters of the prior (e.g. the weights of a neural network,) is the expectation, Y is the image data, X is the ideal image, and L is loss function. The training may be with X and Y (supervised), only X (unsupervised from only good images), or only Y (unsupervised from bad images). If bad images are used, then the training includes correction. The unsupervised Maximum Likelihood loss function from good images above may be represented as:
(X,θ)=−log pθ(X).
The unsupervized Maximum A Posteriori loss function used to restore corrupted data may also be used to optimize θ. It may be represented as:
Both of those examples are unsupervised because the examples do not use a matching (X,Y) pair of the corrupted data and target image for the same scan. Other representations, loss functions, or expectation minimization functions may be used.
The generative nature is used to determine a model of a good quality image. The generative model provides features or kernels used to indicate a good quality image. Being based on good quality images, the generative model is generic to the type of artifact. The same generative model may be used for correction of images suffering from any of various types of artifacts.
The generative model is trained with deep machine learning to output a probability of an input image matching good qualities. The probability is a probability of normality or abnormality. Abnormality reflects the likelihood of an artifact. One probability is output for the image. In alternative embodiments, a spatial distribution of probabilities is output, such as calculating the probability for each pixel or voxel based on the intensities or values of surrounding or neighboring pixels or voxels.
The model parameters (e.g., machine trained features, kernels or layer values) are used to compute the probability that the image fits the generative model of good quality. Images that do not match the generative model will have a low probability, hence identifying the image as including one or more potential abnormalities. Images that do match the generative model will have a high probability. Inverse probabilities may be used (e.g., subtract the probability from 1.0). Matching to a generative model of poor or low quality images may be used in other embodiments.
In one embodiment, the deep-generative machine-learnt model is learnt from training images with different types of artifacts including the artifact. The artifacts may be minor, so the training images have a quality above the threshold. In other embodiments, the deep generative machine-learnt model is learnt or trained with multi-objective training. Multi-objective training includes training images labeled as having quality below a threshold or poor quality. Instead of using just good quality images, images with poor quality are also used. The generative model is initially trained with the good quality images, and then the generative model is refined using the multi-objective training to learn a model that can generate data in general and perform well on the task. A semi-supervised or supervised approach is used for refining so that the generative model learns characteristics of the good quality images that distinguish from poor quality images. Both good and poor quality images with ground truth labels are used. This supervised loss from corrupted data may be represented as:
Multiple losses may be combined to train on all available data (good or all kinds of corruption, with or without supervision). The corruption in the poor-quality images is selected to be specific to the problem, such as noise or motion blur. In one embodiment, the generative model is trained on the good quality images. A collection of bad quality images suffering from a particular type of corruption is used to refine or fine-tune the generative model to make the generative model specific to the type of corruption.
Alternatively, training images having poor quality due to any number of types of distortion may be used to provide a generative model generic to the type of distortion.
While the proposed framework aims at greater flexibility and learnability, any generative model is likely to compute more information than is strictly required for any regularization task of the inverse problem. If computations performance at test time (i.e., during application to a patient's image) is an issue, then a further discriminative network may be trained to distill the inference algorithm. A smaller specialized discriminative network is machine trained to reproduce the results of the complete inference of the generative model. A support vector machine, probabilistic boosting tree, deep learnt neural network, Bayesian, or another machine learnt discriminative network is trained based on the generative model. The generative model provides automated supervision by indicating the ground truth quality of input images, thus increasing the amount of data available for supervised training of the discriminative network. This smaller discriminative network learns from an unlabeled set of corrupted and/or non-corrupted images, using the inference algorithm of the generative model as ground truth.
The resulting discriminative network is applied at test time to the image to output a probability where the probability is based on training from the deep generative machine-learnt model. The discriminative network may be faster than the generative model and requires less annotated data than supervised training from scratch as the generative model provides the ground truth for training.
In other embodiments, the generative model is an invertible network. The machine determines the likelihood of the first image having the distortion or probability (e.g., of artifact abnormality or noise) for the image with a machine-learnt invertible network. The deep generative machine-learnt model is, at least in part, an invertible network. The generative network uses a normalizing flow (e.g., invertible network) as the prior model.
The invertible network is a closed-form expression of the prior probability. The invertible network may use random noise, such as a Gaussian white noise or other noise distributions. A source-filter generative model is formed from a random source z (e.g. white noise), as represented by:
z˜pz(z) and x=D(z)
where D is a decoder, pz is the probability density of z, and x is the image.
Due to being invertible, the change of variable formula for the invertible decoder, D, is represented as: E=D−1, giving the prior probability for the image, px(x), as:
px(x)=pz(E(x))|detJE(x)|
where det J is the determinant of the Jacobian.
Invertible neural networks, in particular neural network architecture, provide for all representable functions to be invertible. The same network can be applied in forward and reverse mode to compute D and E. The determinant of the Jacobian is easy to compute as the chain rule works for determinants.
When a random source with known probability distribution (e.g. white Gaussian noise) is presented at the input, this allows the probability of the output to be computed in closed form using the change of variable formula. Since the network is invertible, the network can also be used in reverse mode to present an image and compute the random source that would have generated the image. The network can be trained to maximize the expected log-probability of the training images.
Due to the use of the invertible network, the generative model is trained with a joint optimization in the unsupervised learning. Both images (imaging) and weights of the invertible network are trained at the same time or as part of the same optimization. The invertible network is trained to minimize loss with respect to both reconstructed images and the learnable parameters. The use of a probability density function of noise, such as Gaussian white or other noise, allows for the joint optimization. The invertible nature and the determinant of the Jacobian based on the change of variable between image, noise, and probability, provide the joint optimization. One loss is computed over both the weights as learnable parameters of the invertible network and the reconstructed images.
The invertible network is trained for use in the inverse solution framework as the regularizer. Given a training set of input measurement y and an invertible network of weights w, the weights and the set of reconstructed images x are jointly optimized to maximize the expected maximum a posteriori (MAP) probability of each image given its measurements. The MAP loss function is represented as:
The encoder, E, and the determinant of the Jacobian, det JE, are both a function of both imaging, x, and weights, θ. The one loss provides a joint optimization over both x and θ.
This differentiable, min-min problem or loss can use a joint gradient descent in training. In one embodiment, a stochastic optimization is used for training the invertible network. The current image, X, for every input data, Y, is stored between epochs, so a memory is provided during training.
The training may use minibatches or subsets of training data at each time due to the joint optimization. To avoid data growth over the entire training dataset, minibatches or subsets are used to separately train and update. Training can be performed iteratively by joint stochastic gradient descent over the weights and a current minibatch of images, which only requires a memory to store the current status of each reconstruction between epochs.
The loss is determined for each instance of the reconstruction image within the minibatch. In other embodiments, a total loss over all of the images of the minibatch as well as the instance loss per instance of the images are used to train. Multi-objective training or training based on good quality images may be used.
Where the invertible network is used, the training data may be formed from data like what is applied at test time or during application. The training data may include good and/or bad images without labels or specific selection of images that are good and/or bad. The training data have the same quality as the data acquired for patients.
This invertible network and corresponding inverse solution may be applied to either image postprocessing tasks (i.e. starting from an image) and reconstruction tasks (i.e. starting from the measurements prior to reconstruction). Starting from the measurements allows the noise in the measurements to be accurately modeled by core physics. To model the artifacts in image domain, the noise is propagated through the reconstruction, which may introduce approximations in the physical model.
By using a machine-trained invertible network, a complete generative model is incorporated into the image correction pipeline. The model is reusable as a regularizer for other acquisition protocols. Since invertible networks also allow sampling and a closed-form expression of the prior probability, the model could also be used after training in other non-imaging applications such as image synthesis (e.g. data augmentation for analytics tasks) or probabilistic reasoning. Contrary to unrolled iterative networks that work for a fixed number of iterations, this approach does not place a computational bottleneck on the reconstruction during training. The invertible network disentangles the irreducible posterior variance due to the quality of the input data from the error that could be reduced with more reconstruction time at convergence in unsupervised learning. Once training of the generative model has been performed, the supervised knowledge distillation may be used to obtain a targeted reconstruction architecture.
In act 34 of
The optimization solves for the corrected image from the input image. The solution is an inverse problem. The acquisition of the measurements or image may be modeled. A physics model relates the degrading or corrupting operation of an imaging system to measurements. This data acquisition model may be represented as:
P(Y|X)
where Y is the measurements corrupted in some way (e.g., blur or noise) and X is the ideal or corrected image.
The desired image may be represented as a parametric prior image model:
pθ(X).
The recovery problem (i.e., inverse solution) is a function of the posterior p(X|Y). Bayes rule provides that:
p(X|Y)∝p(Y|X)pθ(X).
The solution may be expressed as a maximum a posterior probability estimation (MAP), such as represented by:
Other representations may be used.
This inverse solution uses the parametric prior image model. The deep-learnt generative model is used as the prior. The inverse solution uses the probability output by the generative model as the regularlizer term. The regularizer term is a measure of fitting of alteration of the measurements or input image Y to an ideal or good quality image. The regularlizer indicates whether a given correction makes the resulting corrected image more like good quality or corrected images. The generative model provides the probability of the corrected image to be of good quality. The log-likelihood or other likelihood derived therefrom output by the machine-learnt generative model is used as the regularizer in the inverse solution.
The inverse solution uses the physics model of the degrading behavior in acquisition in combination with the generative model. Any physics model representing the physical operation of the imaging system to measure or scan may be used. The physics model encodes the transform of the imaging system. The physics model represents at least one image degradation characteristic of how the imaging system works.
Different physics models may be provided for different degradations. For example, the physics model is directed to noise. The noise distribution for MRI is Gaussian and for CT is Poisson. The physics model includes a characteristic specific to the imaging system or medical scanner. The type of imaging system (e.g., CT vs. MRI) provides the specificity. Alternatively, the noise distribution for a specific model is used. Different models have different arrangements, resulting in different noise distribution. In other alternatives, the noise distribution of an individual imaging system as built is determined through calibration and used as the physics model.
The physics model may include distribution for one or more characteristics. The sensitivity, motion (object or imaging system), collimator effects, noise, reconstruction, masking, compression, and/or other artifact causing operations of the imaging system are modeled.
The physics model and the generative model are used together in the inverse problem solution. An example representation using a linear model with white Gaussian noise for the physics model is represented as:
where the L2 norm term (Y-AX) includes the physics model with a as the noise distribution and A as the corruption operator applied to the image, and the regularlizer term as log pθ(X) being the likelihood from the generative model. Other representations may be used, including non-linear representation.
The inverse solution is optimized by gradient descent. Deep models are usually trained by backpropagation, which is a gradient descent with respect to the inner parameters of the generative model. The inverse solution is likewise performed as a gradient descent. Any gradient-based optimization may be used, such as Bi-conjugate gradient or L-BFGS. In an iterative process, the measurements or image Y after alteration are input to the generative model, providing a probability for that iteration. Other changes are tested in other iterations to identify the alterations providing a sufficiently optimum, best, or optimum corrected image X. The gradient descent is minimized using the physics model in one term and the probability from the generative model in another term (i.e., regularlizer).
The gradient of the regularizer is computed. In one embodiment, the likelihood is the output of the encoder (i.e., generative model) and the encoder is implemented as a piecewise differentiable function such as a neural network. The gradient of the likelihood is computed by backpropagation with respect to the corrected image X. For example, the denoising auto-encoder of the generative model provides a decoder output directly giving the gradient. One example minimization by gradient descent is represented as:
This minimization uses the physics model and the generative model produced likelihood to solve for the corrected image from an input image. The level of the artifact or distortion in the input image Y is minimized, providing the corrected image X. Other minimizations may be used.
In one embodiment, the prior is trained as a Denoising Auto-Encoder D. Its output D(X) is the Minimum Mean Square Error (MMSE) denoised image of its input assuming a Gaussian white noise level 6, and the gradient of the prior can be computed as:
This estimation of the gradient becomes more accurate as the noise level 6 used to train the denoising auto-encoder becomes smaller. It is asymptotically exact when the noise level tends to 0.
In alternative embodiments, Bayesian methods are used instead of gradient descent. For example, Variational Bayes, message passing, Expectation Propagation, or another approach is used. Expectation of loss based on Monte-Carlo simulation may be used. Integrators may be learned along with the generative model to bypass multi-chain Monte Carlo (MCMC) sampling. A side network is learnt to compute the expectation.
Rather than using the Denoising Auto-Encoder or Bayesian methods, the invertible network is used. The optimization includes a regularizer from a log-likelihood output by a machine-learnt normalizing flow. The generative model implementing the regularizer is an invertible network, such as an invertible network of atomic invertible blocks. The machine-learnt normalizing flow was trained with joint optimization from reconstructed images and learnable parameters of the normalizing flow using a known probability distribution.
In act 36, a control is provided for the optimization. The control is provided to constrain the optimization and/or as a change for varying the input image Y for each iteration of the optimization. For example, the machine provides a direction, step size, number of iterations, preconditioner, or combinations thereof in the minimizing.
The control may be randomized, a default, a user selected setting, or based on a pattern. For the control of altering the images, the control is based on information from previous iterations and/or from a current iteration. The amount, direction, and/or other variance to alter the measurements Y from the original input and/or the most recent iteration are provided. The number of iterations to use, step sizes, preconditioners (e.g., a matrix applied to the gradient to refine direction), and/or other setting for optimization are provided.
In one embodiment, the gradient direction is determined for each iteration from the generative model. The gradient direction indicates positive or negative changes, or the direction of the change. The gradient direction is determined for the next iteration in the minimization. The gradient direction is determined from the deep generative machine-learnt model. The same backpropagation method of the generative model is used at test time on the image data to obtain a gradient direction that will increase the likelihood of the image fitting the generative model. The data-fitting part of the problem is usually smooth, so the whole inverse problem may be solved by gradient descent.
In an additional or alternative embodiment, the control or controls are provided by a machine-learnt classifier applied to the image. For example, the step sizes are learnt (for each A) for fast convergence at test time. Algorithmic parameters of the optimization, such as step sizes, number of iterations, and preconditioners, are learnt by backpropagation on a problem-specific basis. For the given artifact and/or specific imaging system, machine learning is used to classify the change given an input image, gradient in likelihood, and/or other information used in the minimization. The machine-learnt classifier for the change may be synthesized from good images and the physical model of the degradation. This training is performed in a supervised way, such as using probabilistic boosting tree or support vector machine learning with labeled ground truth data. Since the resulting machine-learnt classifier involves fewer parameters than the generative model, less data and more rapid output are provided.
After the default or set number of iterations, the resulting corrected image is output. Alternatively, the gradient becomes zero (i.e., the alteration resulting in the peak match is found), so the resulting corrected image is output.
In act 38, the machine transmits the corrected image output from the minimization or other optimization. For example, the image of the patient from the medical scanner as corrected is transmitted. The corrected image has less or no distortion previously included in the image. The level of artifact is reduced or minimized.
The transmission is over a network, through a communications interface, into memory or database (e.g., to a computerized patient medical record), or to a display. For example, the corrected image (optimized image) is displayed on a display device.
The optimization uses the generative model, leveraging unlabeled data in the form of good quality images. This generative model is trained once to be used for any number of different forms of corruption as the generative model represents probability of quality of a type of imaging. Images with different sources of artifacts have as the ideal the same good quality. This generative model is also separate from acquisition parameters, such as the settings of the imaging system used to acquire the image. Images from different acquisitions and corresponding different settings have as the ideal the same good quality. As a result, the optimization is robust to different scanner settings. Separate generative models are not needed for different artifacts and/or scanner configurations.
The terms of the inverse solution function are typically balanced in a way specific to the problem (e.g., source of distortion) and imaging system. Rather than requiring hand-tuning of the balance for many different situations, the generative model sets the regularization strength by the amount of corruption. The regularization is self-weighting. The physics model is calibrated to the imaging system, so experimentation to relatively weight the terms is not needed. Relative weighting may be provided in other embodiments.
The system includes an imaging system 80, a processor 82, a memory 84, a display 86, a communications network 87, a server 88, and a database 90. Additional, different, or fewer components may be provided. For example, network connections or interfaces are provided, such as for networking with a medical imaging network or data archival system. In another example, a user interface is provided. As another example, the server 88 and database 90 are not provided, or only the server 88 and database 90 are provided. In other examples, the server 88 connects through the network 87 with many imaging systems 80 and/or processors 82.
The processor 82, memory 84, and display 86 are part of the medical imaging system 80. Alternatively, the processor 82, memory 84, and display 86 are part of an archival and/or image processing system, such as associated with a medical records database workstation or server, separate from the imaging system 80. In other embodiments, the processor 82, memory 84, and display 86 are a personal computer, such as desktop or laptop, a workstation, a server, a network, or combinations thereof. The processor 82, display 86, and memory 84 may be provided without other components for acquiring data by scanning a patient.
The imaging system 80, processor 82, memory 84 and display 86 are provided at a same location. The location may be a same room, same building, or same facility. These devices are local relative to each other and are remote to the server 88. The server 88 is spaced apart by the network 87 by being in a different facility or by being in a different city, county, state, or country. The server 88 and database 90 are remote from the location of the processor 82 and/or imaging system 80.
The imaging system 80 is a medical diagnostic imaging system. Ultrasound, computed tomography (CT), x-ray, fluoroscopy, positron emission tomography (PET), single photon emission computed tomography (SPECT), and/or magnetic resonance (MR) systems may be used. The imaging system 80 may include a transmitter and includes a detector for scanning or receiving data representative of the interior of the patient.
In one embodiment, the imaging system 80 is a CT system. An x-ray source is connected with a gantry. A detector is also connected with a gantry opposite the x-ray source. The patient is positioned between the source and detector. The source and detector are on opposite sides of the patient and rotate and/or translate about the patient. The detected x-ray energy passing through the patient is converted, reconstructed or transformed into data representing different spatial locations within the patient.
In another embodiment, the imaging system 80 is a MR system. The MR system includes a main field magnet, such as a cryomagnet, and gradient coils. A whole-body coil is provided for transmitting and/or receiving. Local coils may be used, such as for receiving electromagnetic energy emitted by atoms in response to pulses. Other processing components may be provided, such as for planning and generating transmit pulses for the coils based on the sequence and for receiving and processing the received k-space data. The received k-space data is converted into object or image space data with Fourier processing.
The memory 84 may be a graphics processing memory, a video random access memory, a random access memory, system memory, cache memory, hard drive, optical media, magnetic media, flash drive, buffer, database, combinations thereof, or other now known or later developed memory device for storing image data. The memory 84 is part of the imaging system 80, part of a computer associated with the processor 82, part of a database, part of another system, a picture archival memory, or a standalone device.
The memory 84 stores medical imaging data representing the patient, weights or values of parameters making up the layers of the machine-learnt generative model, one or more machine-learnt matrices, values used in optimization, the physics model, and/or corrected images. The memory 84 may store data during processing for application and/or may store training data.
The memory 84 or other memory is alternatively or additionally a non-transitory computer readable storage medium storing data representing instructions executable by the programmed processor 82 for training or use of a machine-learnt generative model in an inverse solution to correct distortions in an image. The instructions for implementing the processes, methods and/or techniques discussed herein are provided on non-transitory computer-readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive or other computer readable storage media. Non-transitory computer readable storage media include various types of volatile and nonvolatile storage media. The functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of instructions stored in or on computer readable storage media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone, or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, and the like.
In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the instructions are stored within a given computer, CPU, GPU, or system.
The processor 82 is a general processor, central processing unit, control processor, graphics processor, digital signal processor, three-dimensional rendering processor, image processor, application specific integrated circuit, field programmable gate array, digital circuit, analog circuit, combinations thereof, or other now known or later developed machine optimizing using a generative model and physics model. The processor 82 is a single device or multiple devices operating in serial, parallel, or separately. The processor 82 may be a main processor of a computer, such as a laptop or desktop computer, or may be a processor for handling some tasks in a larger system, such as in the imaging system 80. The processor 82 is configured by instructions, design, hardware, and/or software to perform the acts discussed herein.
The processor 82 is configured to perform the acts discussed above (e.g., acts 32-38). The processor 82 uses a stored matrix or stored matrices for the machine-learnt generative model. The probability is created by application of the input image to the generative model. The processor 82 uses the likelihood from application of the generative model to an image in regularizing a minimization. The physics model and generative model are used together to correct the image. The processor 82 may apply other machine-learnt classifiers, such as for providing control of the optimization and/or as a substitute classifier learnt using the generative model.
The processor 82 is configured to transmit the corrected image over the network 87, to the display 86, or to the memory 84. The processor 82 may be configured to generate a user interface for receiving corrections or verification of distortion removal.
The display 86 is a monitor, LCD, projector, plasma display, CRT, printer, or other now known or later developed devise for outputting visual information. The display 86 receives images, graphics, text, quantities, or other information from the processor 82, memory 84, imaging system 80, and/or server 88. One or more corrected medical images are displayed. The images are of a region of the patient. The corrected image includes less artifact (number, intensity, visibility, or combinations thereof) than the image prior to correction, so may more accurately portray the patient for diagnosis or therapy planning.
The network 87 is a local area, wide area, enterprise, another network, or combinations thereof. In one embodiment, the network 87 is, at least in part, the Internet. Using TCP/IP communications, the network 87 provides for communication between the processor 82 and the server 88. Any format for communications may be used. In other embodiments, dedicated or direct communication is used.
The server 88 is a processor or group of processors. More than one server 88 may be provided. The server 88 is configured by hardware, firmware, and/or software. In one embodiment, the server 88 performs machine learning with training data in the database 90. The machine-learnt matrices are provided to the processor 82 for application. Alternatively, the server 88 performs the optimization of an image received from the imaging system 80 and provides the corrected image to the imaging system 80.
The database 90 is a memory, such as a bank of memories, for storing training data, such as images and respective scores. Weights or values of parameters of the generative model and/or discriminative classifier are stored in the database 90 and/or the memory 84.
While the invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention.
Claims
1. A method for correction of an image from a medical scanner, the method comprising:
- acquiring, by the medical scanner, the image representing a patient, the image having a level of artifact due to the acquisition by the medical scanner;
- determining, by a machine, a probability of the image with a deep generative machine-learnt model, the deep generative machine-learnt model comprising an invertible network having been trained with joint optimization of images and weights of the invertible network;
- minimizing, by the machine, the level of the artifact in the image, the minimizing being a function of a physics model and the probability, the physics model including a characteristic specific to the medical scanner; and
- transmitting the image output from the minimizing, the image being of the patient and from the medical scanner with the level of the artifact minimized.
2. The method of claim 1 wherein acquiring comprises acquiring the image as a computed tomography, magnetic resonance, ultrasound, positron emission tomography, or single photon emission computed tomography reconstructed image or set of tomographic measurements, the image or tomographic measurements having corruption by noise, blur, inpainting, reconstruction, or combinations thereof.
3. The method of claim 1 wherein determining the probability comprises determining the probability with the invertible network being a closed-form expression of a prior probability.
4. The method of claim 1 wherein determining comprises determining with the invertible network comprising a stack of atomic invertible blocks.
5. The method of claim 1 wherein determining comprises determining with the invertible network having been trained with the joint optimization of images and weights based, in part, on a probability density function of noise.
6. The method of claim 5 wherein determining comprises determining with the invertible network having been trained with the joint optimization of images and weights based, in part, on a determinant of a Jacobian of the invertible network.
7. The method of claim 1 wherein determining comprises determining with the invertible network having been trained with the joint optimization being one loss computed over the weights as learnable parameters of the invertible network and the images as reconstructed images.
8. The method of claim 1 wherein determining comprises determining where the invertible network was trained with stochastic gradient descent over the weights and the images with the training in sub-batches and with the loss being a total loss for the images of the sub-batches and an instance loss per instance of the images.
9. The method of claim 1 wherein minimizing comprises inversely solving with the probability as a regularizer term.
10. The method of claim 1 wherein minimizing comprises minimizing by gradient descent with the physics model in a first term and a gradient of the probability in a second term, the physics model including a distribution of the characteristic specific to the medical scanner.
11. The method of claim 1 wherein minimizing further comprises determining a gradient direction for a next iteration in the minimizing, the gradient direction determined from the deep generative machine-learnt model.
12. The method of claim 1 wherein transmitting comprises transmitting the image to a display.
10387765 | August 20, 2019 | Mailhe et al. |
20170372155 | December 28, 2017 | Odry |
20170372193 | December 28, 2017 | Mailhe |
20190096038 | March 28, 2019 | El-Khamy |
20190251713 | August 15, 2019 | Chen |
20190369191 | December 5, 2019 | Gong |
20200034948 | January 30, 2020 | Park |
20210287780 | September 16, 2021 | Korani |
20210365733 | November 25, 2021 | Guizilini |
20220414954 | December 29, 2022 | Ye |
20230079353 | March 16, 2023 | Mailhe |
Type: Grant
Filed: Sep 14, 2021
Date of Patent: Oct 22, 2024
Patent Publication Number: 20230079353
Assignee: Siemens Healthineers AG (Forchheim)
Inventors: Boris Mailhe (Plainsboro, NJ), Mariappan S. Nadar (Plainsboro, NJ), Simon Arberet (Princeton, NJ), Mahmoud Mostapha (Princeton, NJ)
Primary Examiner: Md K Talukder
Application Number: 17/474,363
International Classification: G06T 7/00 (20170101); G06N 20/00 (20190101); G06T 5/00 (20060101); G06T 5/80 (20240101);