Deep Learning Based Multi-Sensor Detection System for Executing a Method to Process Images from a Visual Sensor and from a Thermal Sensor for Detection of Objects in Said Images
A Deep Learning based Multi-sensor Detection System for executing a method to process images from a visual sensor and from a thermal sensor for detection of objects in said images, wherein a first deep learning network for processing images from the visual sensor and a second deep learning network for pro-cessing images from the thermal sensor are jointly used and collaboratively trained for improving both networks ability to accurately detect said objects in said images.
The invention relates to improving a Deep Learning based Multi-sensor Detection System for executing a method to process images from a visual sensor and from a thermal sensor for detection of objects in said images.
Such a Deep Learning based Multi-sensor Detection System is used to improve object recognition in images. Such Deep Learning technology, object detection, forms the core of autonomous driving systems and uses the images from the sensor to detect multiple objects such as vehicles, pedestrians, and obstructions. These predictions are used to make significant decisions in real-time and hence need to be highly accurate and consistent during all times of day, seasons, weather, and other external influences.
A problem in such object recognition in images is that Low lighting, adverse weather conditions such as rain and snow or other effects such as glare due to high beam, leads to the decline in the image quality of the visual cameras. Hence, while the object detection networks achieve high accuracy during daytime and good illumination conditions, variation in these factors leads to degradation in the performance.
Background ArtK. Agrawal and A. Subramanian, “Enhancing object detection in adverse conditions using thermal imaging,” arXiv preprint arXiv:1909.13551, 2019 proposed a trained network using both RGB and thermal data. This approach did not provide much improvement in overall accuracy.
R. Yadav, A. Samir, H. Rashed, S. Yogamani, and R. Dahyot, “Cnn based color and thermal image fusion for object detection in automated driving,” Irish Machine Vision and Image Processing, 2020 proposed an architecture to fuse visual and thermal images for detection where the features from two networks are extracted and merged in the last convolution layer before feeding it to the decoder for detection. This two-stream network is computationally expensive and the simple fusion logic falls short in complex data scenarios.
C. Li, D. Song, R. Tong, and M. Tang, “Illumination-aware faster r-cnn for robust multispectral pedestrian detection,” Pattern Recognition, vol. 85, pp. 161-171, 2019 proposed to fuse RGB and Thermal data at different layers on the network but these methods require paired images from both modalities at inference which limits their application.
All the above approaches do simple fusion to get one representation from two different data having different distributions. This leads to suboptimal performance.
Note that this application refers to a number of references. Such references are not to be considered as prior art vis-a-vis the present invention. Discussion of such references herein is given for more complete background and is not to be construed as an admission that such references are prior art for patentability determination purposes.
BRIEF SUMMARY OF THE INVENTIONEmbodiments of the present invention are directed to a first deep learning network for processing images from the visual sensor and a second deep learning network for processing images from the thermal sensor are jointly used and collaboratively trained for improving both networks ability to accurately detect the objects in said images. In other words: The Deep Learning based Multi-sensor Detection System of the invention learns from data from at least two different sensors by jointly and collaboratively training the at least two deep learning networks, one on images from a visual camera sensor and another on thermal data from a thermal sensor to improve an object detector's performance across varying lighting and weather conditions. The visual images used in this computer implemented method provide detailed visual cues which are complemented by the thermal images which offer semantic information of objects that might be occluded or less visible in the corresponding visual image. The invention thus integrates the data from the visual and thermal sensors to train the detection system that produces consistent detections irrespective of the ambient lighting or weather.
Favourably the first deep learning network for pro-cessing images from the visual sensor and the second deep learning network for processing images from the thermal sensor receive visual data and thermal data, respectively, that are derived from the same scene. This promotes the flexibility for each network to incorporate complementary knowledge from the other modality without impeding its ability to learn the optimal representation on the modality it is trained on.
In a preferred embodiment a mimicry loss is determined between the first deep learning network for processing images from the visual sensor and the second deep learning network for processing images from the thermal sensor, and used for improving the accuracy of both said networks. The mimicry loss is used to align the feature spaces of both networks and helps in each network learning complementary knowledge of the data from the other network, while a supervised loss helps in retaining the knowledge of each network's own data.
Further it is preferred that an overall loss function for each of the first network and second network is determined which is represented by the sum of the mimicry loss and the supervised detection loss of the first network and second network, respectively.
Advantageously each of the first network and the second network comprises an encoder and a detection head for localization and classification of objects in the images, wherein both the first network and the second network are provided with a decoder taking features from intermediate layers of the encoder to reconstruct the images. Reconstruction is an auxiliary task that aids in extracting from the data all the semantic information into learned representation. Accordingly, the method of the invention is encouraged to explore the input feature space exhaustively and extract all the semantic information into the learned representations.
There are several options to reconstruct the inputs.
In one embodiment the decoder for the visual images takes features from the encoder for the visual images, and the decoder for the thermal images takes features from the encoder for the thermal images. As an auxiliary task this reconstruction aids in extracting from the data all the semantic information into learned representation.
In another embodiment the decoder for the visual images takes features from the encoder for the thermal images, and the decoder for the thermal images takes features from the encoder for the visual images. Such cross reconstruction learns to use semantic information from thermal data to reconstruct visual images, thus disentangling the features to learn effective representations.
Objects, advantages and novel features, and further scope of applicability of the present invention will be set forth in part in the detailed description to follow, taken in conjunction with the accompanying drawings, and in part will become apparent to those skilled in the art upon examination of the following, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
The invention will hereinafter be further elucidated with reference to the drawing of an exemplary embodiment of a MultiModal Framework according to the invention to combine data from different sensors to provide a reliable and comprehensive detection system that is not limiting as to the appended claims. The accompanying drawings, which are incorporated into and form a part of the specification, illustrate one or more embodiments of the present invention and, together with the description, serve to explain the principles of the invention. The drawings are only for the purpose of illustrating one or more embodiments of the invention and are not to be construed as limiting the invention. In the drawing:
Whenever in the figures the same references or reference numerals are applied, these references or reference numerals refer to the same parts.
DETAILED DESCRIPTION OF THE INVENTIONWith reference to
The overall loss function per network is the sum of detection loss and the mimicry loss. The KL divergence (DKL) is applied on the soft logits prgb and pthm. λrgb and λthm are the balancing weights.
MMC−RGB=et+λrghKL(prgb∥pthm)
MMC−Thm=et+λthmKL(pthm∥prgb)
The detection loss is a weighted summation of classification and regression losses:
To further encourage the method according to an embodiment of the present invention to explore the input feature space exhaustively and extract all the semantic information into the learned representations, an auxiliary task for reconstructing the inputs can be applied. The auxiliary task network takes in the features from the intermediate layers of encoders and aims to reconstruct the input image via the decoders. Hence, each of the first network and the second network comprises an encoder and a detection head for localization and classification of objects in the images, and both the first network and the second network are provided with a decoder taking features from intermediate layers of the encoder to reconstruct the images. There are two possible embodiments:
-
- MMC+Reconstruction
- MMC+Cross Reconstruction
In the first embodiment providing MMC+Reconstruction, the decoder for the visual images takes features from the encoder for the visual images, and the decoder for the thermal images takes features from the encoder for the thermal images. This shows
Rec−RGB=Σ(xrgb−Decrgb(Encrgb(xrgb))2
Rec−Thm=Σ(xthm−Decthm(Encthm(xthm))2
CrossRec−RGB=Σ(xrgb−Decrgb(Encthm(xthm))2
CrossRec−Thm=Σ(xthm−Decthm(Encrgb(xrgb))2
Optionally, embodiments of the present invention can include a general or specific purpose computer or distributed system programmed with computer software implementing steps described above, which computer software may be in any appropriate computer language, including but not limited to C++, FORTRAN, BASIC, Java, Python, Linux, assembly language, microcode, distributed programming languages, etc. The apparatus may also include a plurality of such computers/distributed systems (e.g., connected over the Internet and/or one or more intranets) in a variety of hardware implementations. For example, data processing can be performed by an appropriately programmed microprocessor, computing cloud, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or the like, in conjunction with appropriate memory, network, and bus elements. One or more processors and/or microcontrollers can operate via instructions of the computer code and the software is preferably stored on one or more tangible non-transitive memory-storage devices.
Although the invention has been discussed in the foregoing with reference to exemplary embodiments of the Deep Learning based Multi-sensor Detection System of the invention, the invention is not restricted to these particular embodiments which can be varied in many ways without departing from the invention. The discussed exemplary embodiments shall therefore not be used to construe the appended claims strictly in accordance therewith. On the contrary the embodiments are merely intended to explain the wording of the appended claims without intent to limit the claims to these exemplary embodiments. The scope of protection of the invention shall therefore be construed in accordance with the appended claims only, wherein a possible ambiguity in the wording of the claims shall be resolved using these exemplary embodiments.
Embodiments of the present invention can include every combination of features that are disclosed herein independently from each other. Although the invention has been described in detail with particular reference to the disclosed embodiments, other embodiments can achieve the same results. Variations and modifications of the present invention will be obvious to those skilled in the art and it is intended to cover in the appended claims all such modifications and equivalents. The entire disclosures of all references, applications, patents, and publications cited above are hereby incorporated by reference. Unless specifically stated as being “essential” above, none of the various components or the interrelationship thereof are essential to the operation of the invention. Rather, desirable results can be achieved by substituting various components and/or reconfiguration of their relationships with one another.
Claims
1. A Deep Learning based Multi-sensor Detection System for executing a method to process images from a visual sensor and from a thermal sensor for detection of objects in said images, wherein a first deep learning network for processing images from the visual sensor and a second deep learning network for processing images from the thermal sensor are jointly used and collaboratively trained for improving both networks ability to accurately detect said objects in said images.
2. The Deep Learning based Multi-sensor Detection System of claim 1, that learns from data from at least two different sensors by jointly and collaboratively training two deep learning networks, one on images from a visual camera sensor and another on thermal data from a thermal sensor to improve an object detector's performance across varying lighting and weather conditions.
3. The Deep Learning based Multi-sensor Detection System of claim 1, wherein the first deep learning network for processing images from the visual sensor and the second deep learning network for processing images from the thermal sensor receive visual data and thermal data, respectively, that are derived from the same scene.
4. The Deep Learning based Multi-sensor Detection System of claim 1, wherein a mimicry loss is determined between the first deep learning network for processing images from the visual sensor and the second deep learning network for processing images from the thermal sensor, and used for improving the accuracy of both said networks.
5. The Deep Learning based Multi-sensor Detection System of claim 4, wherein the mimicry loss is used to align the feature spaces of both networks and helps in each network learning complementary knowledge of data from the other network, while a supervised loss helps in retaining the knowledge of a network's own data.
6. The Deep Learning based Multi-sensor Detection System of claim 4, wherein an overall loss function for each of the first network and second network is determined which is represented by the sum of the mimicry loss and the supervised detection loss of the first network and second network, respectively.
7. The Deep Learning based Multi-sensor Detection System of claim 1, wherein each of the first network and the second network comprises an encoder and a detection head for localization and classification of objects in the images, and that both the first network and the second network are provided with a decoder taking features from intermediate layers of the encoder to reconstruct the images.
8. The Deep Learning based Multi-sensor Detection System of claim 7, wherein the decoder for the visual images takes features from the encoder for the visual images, and wherein the decoder for the thermal images takes features from the encoder for the thermal images.
9. The Deep Learning based Multi-sensor Detection System of claim 7, wherein the decoder for the visual images takes features from the encoder for the thermal images, and wherein the decoder for the thermal images takes features from the encoder for the visual images.
Type: Application
Filed: Jan 21, 2022
Publication Date: Jul 27, 2023
Inventors: Shruthi Gowda (Eindhoven), Elahe Arani (Eindhoven), Bahram Zonooz (Eindhoven)
Application Number: 17/581,759