METHOD AND SYSTEM FOR RECOGNITION OF A TARGET IN A THREE DIMENSIONAL SCENE
A method for three-dimensional reconstruction of a three-dimensional scene and target object recognition may include acquiring a plurality of elemental images of a three-dimensional scene through a microlens array; generating a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging; and recognizing the target object in the reconstructed display plane by using an image recognition or classification algorithm.
Latest THE UNIVERSITY OF CONNECTICUT Patents:
- Systems and methods for protecting machine learning models against adversarial attacks
- System and Method for Reversible Sensory Nerve Block
- 3-o-sulfo-galactosylceramide analogs as activators of type II NKT cells and uses thereof
- Long-acting local anesthetic formulation
- Multigel tumor-on-a-chip system
This application claims the benefit of the date of the earlier filed provisional application, U.S. Provisional Application Number 61/007,043, filed on Dec. 10, 2007, the contents of which are incorporated by reference herein in its entirety.
FIELD OF THE INVENTIONThe present invention relates generally to the fields of imaging systems; three-dimensional (3D) image processing; 3D image acquisition; and systems for recognition of objects and targets.
BACKGROUNDThree-dimensional (3D) imaging and visualization techniques have been the subject of great interest. Integral imaging is a promising technology among 3D imaging techniques. Integral imaging systems use a microlens array to capture light rays emanating from 3D objects in such a way that the light rays that pass through each pickup microlens are recorded on a two-dimensional (2D) image sensor. The captured 2D image arrays are referred to as elemental images. The elemental images are 2D images, flipped in both the x and y direction, each with a different perspective of a 3D scene. To reconstruct the 3D scene optically from the captured 2D elemental images, the rays are reversely propagated from the elemental images through a display microlens array that is similar to the pickup microlens array.
In order to overcome image quality degradation introduced by optical devices used in an optical integral imaging reconstruction process, and also to obtain arbitrary perspective within the total viewing angle, computational integral imaging reconstruction techniques have been proposed (see H. Arimoto and B. Javidi, “Integral three-dimensional imaging with digital reconstruction,” Opt. Lett. 26,157-159 (2001); A. Stem and B. Javidi, “Three-dimensional image sensing and reconstruction with time-division multiplexed computational integral imaging,” Appl. Opt. 42, 7036-7042 (2003); M. Martinez-Corral, B. Javidi, R. Martinez-Cuenca, and G. Saavedra, “Integral imaging with improved depth of field by use of amplitude modulated microlens array,” Appl. Opt. 43, 5806-5813 (2004); S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express 12, 483-491 (2004), www.opticsexpress.org/abstract.cfm?URI=OPEX-12-3-483; and S. Yeom, B. Javidi, and E. Watson, “Photon counting passive 3D image sensing for automatic target recognition,” Opt. Express 13, 9310-9330 (2005), www.opticsinfobase.org/abstract.cfm?URI=oe-13-23-9310).
The reconstructed high resolution image that could be obtained with resolution improvement techniques is an image reconstructed from a single viewpoint. Recently, a volumetric computational integral imaging reconstruction method has been proposed, which uses all of the information of the elemental images to reconstruct the full 3D volume of a scene. It allows one to reconstruct 3D voxel values at any arbitrary distance from the display microlens array.
In a complex scene, some of the foreground objects may occlude the background objects, which prevents us from fully observing the background objects. To reconstruct the image of the occluded background objects with the minimum interference of the occluding objects, multiple images with various perspectives are required. To achieve this goal, a volumetric II reconstruction technique with inverse projection of the elemental images has been applied to the occluded scene problem (see S.-H. Hong and Bahram Javidi, “Three-dimensional visualization of partially occluded objects using integral Imaging,” IEEE J. Display Technol. 1, 354-359 ( 2005)).
Many pattern recognition problems can be solved with the correlation approach. To be distortion tolerant, the correlation filter should be designed with a training data set of reference targets to recognize the target viewed from various rotated angles, perspectives, scales and illuminations. Many composite filters have been proposed according to their optimization criteria. An optimum nonlinear distortion tolerant filter is obtained by optimizing the filter's discrimination capability and noise robustness to detect targets placed in a non-overlapping (disjoint) background noise. The filter is designed to maintain fixed output peaks for the members of the true class training target set. Because the nonlinear filter is derived to minimize the mean square error of the output energy in the presence of disjoint background noise and additive overlapping noise, the output energy is minimized in response to the input scene, which may include the false class objects.
One of the challenging problems in pattern recognition is the partial occlusion of objects, which can seriously degrade system performance. Most approaches to this problem have been addressed by the development of specific algorithms, such as statistical techniques or contour analysis, applied to the partially occluded 2D image. In some approaches it is assumed that the objects are planar and represented by binary values. Scenes involving occluded objects have been studied recently by using 3D integral imaging systems with computational reconstruction. The reconstructed 3D object in the occluded scene can be correlated with the original 3D object.
In view of these issues, there is a need for improvements in distortion-tolerant 3D recognition of occluded targets. At least an embodiment of a method and system for 3D recognition of an occluded target may include an optimum nonlinear filter technique to detect distorted and occluded 3D objects using volumetric computational integral imaging reconstruction.
SUMMARY OF THE INVENTIONAt least an embodiment of a method for three-dimensional reconstruction of a three-dimensional scene and target object recognition may include acquiring a plurality of elemental images of a three-dimensional scene through a microlens array; generating a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging; and recognizing the target object in the reconstructed display plane by using an image recognition or classification algorithm.
At least an embodiment of a system for three-dimensional reconstruction of a three-dimensional scene and target object recognition may include a CCD camera structured to record a plurality of elemental images; a microlens array positioned between the CCD camera and the three-dimensional scene; a processor connected to the CCD camera, the processor being structured to generate a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging and structured to recognize the target object in the reconstructed display plane by using an image recognition or classification algorithm.
Embodiments will now be described, by way of example only, with reference to the accompanying drawings which are meant to be exemplary, not limiting, and wherein like elements are numbered alike in several Figures, in which:
As seen in
Since it is possible to reconstruct display planes 68 of interest with volumetric computational integral imaging reconstruction, it is possible to separate the reconstructed background objects 60 from the reconstructed foreground objects 62. In other words, it is possible to reconstruct the image of the original background object 10 with a reduced effect of the original foreground occluding objects 12. However, there is a constraint on the distance between the foreground objects 10 and background objects 12. The minimum distance between the occluding object and a pixel on the background object is d0×lcl(n−1)p, where d0 is the distance between the virtual pinhole array and the pixel of the background object, lc is the length of the occluding foreground object, p is the pitch of the virtual pinhole, and n is the rhombus index number which defines a volume in the reconstructed volume.
As described in detail below ri(t) denotes one of the distorted reference targets where i=1, 2, . . . , T, and T is the size of reference target set. The input image s(t) which may include distorted targets is
where vi is a binary random variable which takes a value of 0 or 1, of which probability mass functions are p(vi=1)=1/T and p(vi=0)=1−1T. In Eq. (1), vi indicates whether the target ri(t) is present in the scene or not. If ri(t) is one of the reference targets, nb(t) is the non-overlapping background noise with mean mb, na(t) is the overlapping additive noise with mean ma, w(t) is the window function for the entire input scene, wri(t) is the window function for the reference target ri(t), τi is a uniformly distributed random location of the target in the input scene, whose probability density function is f (τi)=w(τi)l d (d is the area of the support region of the input scene). nb(t) and na(t) are assumed to be wide-sense stationary random processes and statistically independent to each other.
The filter is designed so that when the input to the filter is one of the reference targets, then the output of the filter in the Fourier domain expression becomes
where H(k) and Ri(k) are the discrete Fourier transforms of h(t) (impulse response of the distortion tolerant filter) and ri(t), respectively, * denotes complex conjugate, M is the number of sample pixels, and Ci is a positive real desired constant. Equation (2) is the constraint imposed on the filter. To obtain noise robustness, the output energy due to the disjoint background noise and additive noise is minimized. Both disjoint background noise and additive noise can be integrated and represented in one noise term as
A linear combination of the output energy due to the input noise and the output energy due to the input scene is minimized under the filter constraint in Eq. (2).
Let ak+jbk be the k-th element of H(k), and cik+jdik be the k-th element of Ri(k), and D(k)=(wnE|N(k)|2+wd|S(k)|2)/M in which E is the expectation operator, N(k) is the Fourier transform of n(t) , S(k) is the Fourier transform of s(t), wn and wd are the positive weights of the noise robustness capability and discrimination capability, respectively. Now, the problem is to minimize
with the real and imaginary part constrained, because MCi is a real constant in Eq. (2). The Lagrange multiplier is used to solve this minimization problem. Let the function to be minimized with the Lagrange multipliers λ1i, λ2i be
One must find ak, bk, and λ1i, λ2i that satisfy filter constraints. Values can be obtained for ak and bk that minimize J and satisfy the required constraints,
The following additional notations are used to complete the derivation,
where superscript t is the matrix transpose, and Re(), Im() denote the real and imaginary parts, respectively. Let A and B be T×T matrices whose elements at (x, y) are Ax,y, and Bx,y, respectively. ak and bk are substituted into the filter constraints and solve for λ1i, λ2i,
λ1t=MCt(A+BA−1B)−1, λ2t=MCt(A+BA−1B)−1 BA−1, (6)
From Eqs. (5) and (6), the k-th element of the distortion tolerant filter H(k) is obtained from:
Both wn and wd in D(k) are chosen as M/2.Therefore, the optimum nonlinear distortion tolerant filter H(k) is
where Φb0 (k) is the power spectrum of the zero-mean stationary random process nb0 (t), and Φa0 (k) is the power spectrum of the zero-mean stationary random process na0 (t). W(k) and Wri(k) are the discrete Fourier transforms of w(t) and wri(t), respectively. denotes a convolution operator. λ1i and λ2i are obtained from Eq. (6).
While the embodiment described above discusses and optimum nonlinear filter, it will be appreciated that this is not a necessary feature. In fact, it is noted that any suitable image recognition or classification algorithm can be used. In at least one embodiment, a classification algorithm can be used before an image recognition algorithm. For example, a classification algorithm could be used to classify a target object as either a car or a truck, and then an image recognition algorithm could be used to further classify the object into a particular type of car or truck.
Additionally, at least the above embodiment describes a distortion tolerant algorithm. Distortion in this context can mean that the target object is different in some way from a reference object used for identification. For example, the target object may be rotated (e.g., in-plane rotation or out of plane rotation), there could be a different scale or magnification from the reference object, the target object may have a different perspective than the reference object, or the target object may be illuminated in a different way than the reference object. It will be understood that these are not the distortion tolerant algorithm is not limited to these examples, and that there are other possible examples of distortion with which the distortion tolerant algorithm would work.
However, these dimensions are indicated only to summarize the conditions of one particular experimental setup, and are not meant to be limiting in any way.
In the experimental setup shown in
To compare the performance of a filter for various schemes, a peak-intensity-to-sidelobe ratio (PSR) is used. The PSR is a ratio of the target peak intensity to the highest sidelobe intensity:
PSR=peak intensity/highest sidelobe intensity
Using a conventional 2D optimum filter for the 2D scene, the output peak intensity of the red occluded car is 0.0076. The PSR of the 2D correlation for the occluded input scene is 1.5431.
In the experiments for recognition with 3D volumetric reconstruction, an integral imaging system is used for picking up the elemental images with a lenslet array with pitch p=1.09 mm and a focal length of 3 mm. The cars are located the same distance from the lenslet array as in the previous experiment to obtain a 19×94 elemental image array. The resolution of each elemental image is 66×66 pixels.
A digital 3D reconstruction was performed in order to obtain the original left car 6, as seen in
The output peak intensity of the left car 6 is 0.1853, and the PSR for the output plane showing the left car 6 (i.e.,
These experimental results show that the performance of the proposed recognition system with 3D volumetric reconstruction for occluded objects is superior to the performance of the correlation of the occluded 2D images.
The pickup microlens array 20 is placed in front of the object to form the elemental image array. In the embodiment shown in
The striped car 112 is a true class target, and the solid car 114 is a false object. In other words, it is desired to detect only the striped car 112 in a scene that contains both of the solid car 114 and striped car 112. Because of the similarity of the shape of the cars used in the experiments, it is difficult to detect the target object with linear filters. Seven different elemental image sets are obtained by rotating the reference target from 30° to 60° in 5° increments. One of the captured elemental image sets that are used to reconstruct the 3D training targets are shown in
From each elemental image set with rotated targets, we have reconstructed the images from z=60 mm to z=72 mm in 1 mm increments. Therefore, for each rotated angle (from 30° to 60° in 5° increments) 13 reconstructed images are used as a 3D training reference target. As rotation angle increases, one can observe more of the side view of the object and less frontal view. The input elemental images have a true class training target, or a true class non-training target and a false object (solid car 114). True class training target is a set of 13 reconstructed images of the striped car 112 rotated at 45°. True class non-training target is a set of 13 reconstructed images of the solid car 114 rotated at 32.5°, which is not from the training reference targets. True class training and non-training targets are located on the right side of the input scene and the false object is located at the left side of the scene. The true class non-training target used in the test is distorted in terms of out-of-plane rotation, which is challenging to detect.
The distortion tolerant optimum nonlinear filter has been constructed in a 4D structure, that is, x, y, z coordinates (i.e., spatial coordinates) and 3 color components.
Because of the constraint of the minimum distance between the occluding object and a pixel on the background object, the experimental setup is very important to reconstruct the background image with a reduced effect of the foreground occluding objects. One of the parameters to determine the minimum distance is the density of the occluding foreground object. If the density of the foreground objects is high, the background object should be farther from the image pickup system. If not, the background objects may not be fully reconstructed, which can result in poor recognition performance. Nevertheless, even in this case, the proposed approach gives us better performance than that of the 2D recognition systems [18].
Using a 3D computational volumetric II reconstruction system and a 3D distortion tolerant optimum nonlinear filtering technique, a partially occluded and distorted 3D objects can be recognized in a 3D scene. The experimental results show that the background objects can be reconstructed with the reduced effect of occluding foreground. With the distortion tolerant 4D optimum nonlinear filter (3D coordinates plus color), one sees the recognition capability of the rotated 3D targets when the input scene contains false objects and is partially occluded by foreground objects such as vegetation.
The above description discusses the methods and systems in the context of visible light imaging. However, it will also be understood that the above methods and systems can also be used in multi-spectral applications, including, but not limited to, infrared applications as well as other suitable combinations of visible and non-visible light. For example, in the context of the embodiments described above, in at least an embodiment the plurality of elemental images may be generated using multi-spectral light or infrared light, and the CCD camera may be structured to record multi-spectral light or infrared light.
While the description above refers to particular embodiments of the present invention, it will be understood that many modifications may be made without departing from the spirit thereof. The accompanying claims are intended to cover such modifications as would fall within the true scope and spirit of the present invention.
The presently disclosed embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims, rather than the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims
1. A method for three-dimensional image reconstruction and target object recognition comprising: H ( k ) = ∑ i = 1 T ( λ 1 i - j λ 2 i ) R i ( k ) ( 1 M T ∑ i = 1 T ( Φ b 0 ( k ) ⊗ { W ( k ) 2 + W ri ( k ) 2 - 2 W ( k ) 2 d Re [ W ri ( k ] } ) + 1 M Φ a 0 ( k ) ⊗ W ( k ) 2 + 1 T ∑ i = 1 T ( m ∑ b 2 { W ( k ) 2 + W ri ( k ) 2 - 2 W ( k ) 2 d Re [ W ri ( k ) ] } + 2 m a m b W ( k ) 2 Re [ 1 - W ri ( k ) d ] ) + m a 2 W ( k ) 2 + S ( k ) 2 ),
- acquiring a plurality of elemental images of a three-dimensional scene through a microlens array;
- generating a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging; and
- recognizing the target object in the reconstructed display plane by using a three-dimensional optimum nonlinear filter H(k);
- wherein the three-dimensional optimum nonlinear filter H(k) is given by the equation:
- wherein T is the size of a reference target set;
- λ1i and λ2i are Lagrange multipliers;
- Ri(k) is a discrete Fourier transform of an impulse response of a distorted reference target;
- M is a number of sample pixels;
- d is an area of a support region of the three dimensional scene;
- Re[ ] is an operator indicating the real part of an expression
- ma is a mean of overlapping additive noise;
- mb is a mean of non-overlapping background noise;
- Φb0 (k) is a power spectrum of a zero-mean stationary random process nb0 (t), and Φa0 (k) is a power spectrum of a zero-mean stationary random process na0 (t);
- S(k) is a Fourier transform of an input image;
- W(k) is a discrete Fourier transform of a window function for the three-dimensional scene;
- and Wri(k) is a discrete Fourier transform of a window function for the reference target; and
- denotes a convolution operator.
2. A system for three-dimensional reconstruction of a three-dimensional scene and target object recognition, comprising: H ( k ) = ∑ i = 1 T ( λ 1 i - j λ 2 i ) R i ( k ) ( 1 M T ∑ i = 1 T ( Φ b 0 ( k ) ⊗ { W ( k ) 2 + W ri ( k ) 2 - 2 W ( k ) 2 d Re [ W ri ( k ] } ) + 1 M Φ a 0 ( k ) ⊗ W ( k ) 2 + 1 T ∑ i = 1 T ( m ∑ b 2 { W ( k ) 2 + W ri ( k ) 2 - 2 W ( k ) 2 d Re [ W ri ( k ) ] } + 2 m a m b W ( k ) 2 Re [ 1 - W ri ( k ) d ] ) + m a 2 W ( k ) 2 + S ( k ) 2 ),
- a CCD camera structured to record a plurality of elemental images;
- a microlens array positioned between the CCD camera and the three-dimensional scene;
- a processor connected to the CCD camera, the processor being structured to generate a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging and structured to recognize the target object in the reconstructed display plane by using a three-dimensional optimum nonlinear filter H(k);
- wherein the three-dimensional optimum nonlinear filter H(k) is given by the equation:
- wherein T is the size of a reference target set;
- λ1i and λ2i are Lagrange multipliers;
- Ri(k) is a discrete Fourier transform of an impulse response of a distorted reference target;
- M is a number of sample pixels;
- d is an area of a support region of the three dimensional scene;
- Re[ ] is an operator indicating the real part of an expression
- ma is a mean of overlapping additive noise;
- mb is a mean of non-overlapping background noise;
- Φb0 (k) is a power spectrum of a zero-mean stationary random process nb0 (t), and Φa0 (k) is a power spectrum of a zero-mean stationary random process na0 (t);
- S(k) is a Fourier transform of an input image;
- W(k) is a discrete Fourier transform of a window function for the three-dimensional scene;
- and Wri (k) is a discrete Fourier transform of a window function for the reference target; and
- denotes a convolution operator.
3. A method for three-dimensional reconstruction of a three-dimensional scene and target object recognition comprising:
- acquiring a plurality of elemental images of a three-dimensional scene through a microlens array;
- generating a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging; and
- recognizing the target object in the reconstructed display plane by using an image recognition or classification algorithm.
4. The method of claim 3, wherein the three-dimensional scene comprises a background object and foreground object, wherein the foreground object at least partially occludes, obstructs, or distorts the background object.
5. The method of claim 3, wherein the generating a reconstructed display plane comprises inverse mapping through a virtual pinhole array.
6. The method of claim 3 wherein the generating a reconstructed display plane is repeated for a plurality of reconstruction planes to thereby generate a reconstructed three-dimensional scene.
7. The method of claim 4 wherein the effect of the occlusion, obstruction, or distortion caused by the foreground object is minimized when recognizing the target object.
8. The method of claim 3 wherein the three-dimensional scene comprises an object of military, law enforcement, or security interest.
9. The method of claim 3 wherein the 3D scene of interest comprises an object of scientific, biological, or medical interest.
10. The method of claim 3, wherein the image recognition or classification algorithm is an optimum nonlinear filter.
11. The method of claim 10, wherein the optimum nonlinear filter is constructed in a four dimensional structure.
12. A system for three-dimensional reconstruction of a three-dimensional scene and target object recognition, comprising:
- a CCD camera structured to record a plurality of elemental images;
- a microlens array positioned between the CCD camera and the three-dimensional scene;
- a processor connected to the CCD camera, the processor being structured to generate a reconstructed display plane based on the plurality of elemental images using three-dimensional volumetric computational integral imaging and structured to recognize the target object in the reconstructed display plane by using an image recognition or classification algorithm.
13. The system of claim 12, wherein the image recognition or classification algorithm is an optimum nonlinear filter.
14. The system of claim 13, wherein the optimum nonlinear filter is constructed in a four-dimensional structure.
15. The system of claim 12, wherein the processor is structured to generate reconstructed display plane by inverse mapping through a virtual pinhole array.
16. The method of claim 3, wherein the optimum nonlinear filter is a distortion-tolerant optimum nonlinear filter.
17. The system of claim 12, wherein the optimum nonlinear filter is a distortion-tolerant optimum nonlinear filter.
18. The method of claim 16, wherein the distortion-tolerant optimum nonlinear filter is designed with a training data set of reference targets to recognize the target object when viewed from various rotated angles, perspectives, scales, or illuminations.
19. The method of claim 17, wherein the distortion-tolerant optimum nonlinear filter is designed with a training data set of reference targets to recognize the target object when viewed from various rotated angles, perspectives, scales, or illuminations.
20. The method of claim 3, wherein the plurality of elemental images are generated using multi-spectral light.
21. The method of claim 3, wherein the plurality of elemental images are generated using infrared light.
22. The system of claim 12, wherein the CCD camera is structured to record multi-spectral light.
23. The system of claim 12, wherein the CCD camera is structured to record infrared light.
24. The method of claim 11, wherein the four-dimensional structure of the optimum nonlinear filter includes spatial coordinates and a color component.
25. The system of claim 14, wherein the four-dimensional structure of the optimum nonlinear filter includes spatial coordinates and a color component.
Type: Application
Filed: Dec 10, 2008
Publication Date: Jun 25, 2009
Applicant: THE UNIVERSITY OF CONNECTICUT (Farmington, CT)
Inventors: Bahram Javidi (Storrs, CT), Seung-Hyun Hong (Storrs, CT)
Application Number: 12/331,984
International Classification: H04N 5/335 (20060101); G06T 15/00 (20060101); G06K 9/00 (20060101);