LIDAR LOCALIZATION USING OPTICAL FLOW
A method for determining a lidar sensor pose with respect to a predefined map image, comprising acquiring a lidar height map; determining an optical flow field, which relates the lidar height map and the map image; and computing a maximum-likelihood/ML estimate of the lidar sensor pose on the basis of the determined optical flow field. The optical flow field may optionally be determined by a regression model, which additionally produces an associated variability tensor to be used in the ML estimation. In particular, the optical flow field may be determined by a trained neural network.
The present application claims priority to European Patent Application No. 20208113.9, filed on Nov. 17, 2020, and entitled “LIDAR LOCALIZATION USING OPTICAL FLOW,” which is incorporated herein by reference in its entirety.
TECHNICAL FIELDThe present disclosure relates to the field of navigation technology and in particular to a method and system for accurate localization of a light detection and ranging (lidar) sensor with respect to a predefined map. Localization according to this disclosure is potentially useful in driver assistance and automated driving.
BACKGROUNDWith the introduction of high-performance driver assistance systems and automated driving functionality, the requirement for precise position knowledge has increased to the point where satellite-based localization must be coupled with costly correction services to provide adequate accuracy. Even then, such solutions are subject to severe availability and reliability issues, due to various problems including multi-path signals and signal outages in challenging conditions.
A common alternative for high precision localization is to map areas of interest, and then localize relative the pre-recorded map. The lidar sensor is robust to illumination and texture variability, which is an important advantage compared to camera sensors for the localization task. Furthermore, the lidar sensor is useful in other tasks, such as object detection and tracking, which makes it a practical choice for autonomous vehicles.
Most localization methods see the problem divided into a position retrieval stage, commonly referred to as global localization, and a local refinement stage. The coarser global localization is often left to external sensors, such as satellite-based positioning or inertial techniques, though lidar-based position retrieval methods do exist. The present disclosure addresses the local refinement problem.
Early lidar localization methods used template matching to find the rigid transformation that maximizes correlation between the sensor data and the map. To achieve this, both sensor and map points are projected into two-dimensional (2D) images from a top-view perspective, and templates resulting from all transforms in a discrete search space are correlated with the map. Localization accuracy is generally sub-decimeter, but the search space must be constrained to limit the computational complexity, meaning that an accurate sensor pose prior is required.
Another option for lidar localization is to apply a point cloud registration method. The approach of a large body of registration methods is to find a set of correspondences, i.e., pairs of matching features in sensor data and map, and to compute the rigid body transformation that best aligns sensor data with the map. Iterative closest point (ICP) methods run repeated closest-distance searches to determine correspondences, gradually to approach an alignment. ICP and related methods suffer from a tendency to converge to a local minimum when initialization is inaccurate, and are burdened by the computational cost of their repeated correspondence searches. Fast global registration (FGR), for its part, addresses such shortcomings by computing correspondences once, using local feature descriptors, and directly solves for the pose by minimizing a global objective function. FGR is speedy and less affected by the problem with local minima but may be vulnerable to incorrect or ambiguous correspondence matches.
Recent registration literature has applied deep learning to encode better performing descriptors and to detect key points whose descriptors are likely to form accurate matches. This has led to significant improvements in descriptor performance for registration. Yet the problem of encoding point descriptors that capture both the large structure shape required for global matching and the fine detail necessary for precision localization largely remains unsolved.
SUMMARYOne objective is to propose a method for determining a lidar sensor pose with a linear and angular localization accuracy that is comparable to ICP and other high-performing methods, typically <0.04 m position and <0.1° heading angle. Another objective is to propose a lidar pose determination method that typically manages to recover position with a prior error of 20 m or more. Another objective is to propose a lidar pose determination method that is robust to ‘difficult’ scenes, with non-salient or repetitive structure. It is a particular objective to propose such a method which lends itself to implementation, in part, by a regression model. Ease of training is a desirable property of such a network. It is furthermore desirable that the network is capable of regression at different spatial scales, to enable it to handle both position recovery and high-accuracy localization. It is finally an objective to propose hardware implementing a method with the above characteristics.
At least some of these objectives are achieved by the invention as defined by the independent claims. The dependent claims are directed to advantageous embodiments of the invention.
According to a first aspect of the invention, there is provided a method for determining a lidar sensor pose with respect to a predefined map image. The method comprises the steps of acquiring a lidar height map; determining an optical flow field, which relates the lidar height map and the map image; and computing a maximum-likelihood (ML) estimate of the lidar sensor pose on the basis of the determined optical flow field.
The use of an optical flow field to find the relation between lidar height map and the map image contributes to the performance of the method. For example, several efficient conventional algorithms exist for computing an optical flow field, and machine-learning implementations constitute another attractive option. The inventor has realized, furthermore, that the optical-flow-based approach enhances the robustness and accuracy as well.
As used herein, “optical flow field”—or “flow field” for short—may be understood as the apparent motion of visual features in a sequence of two or more images. Optical flows for example cases are illustrated in
In the present disclosure, furthermore, a “height map” may be understood as a collection of ground (or horizontal) coordinates each associated with a vertical coordinate. For example, the height map may be defined with reference to a discrete horizontal reference frame, where at least some of the horizontal cells or points are associated with a height value. If the lidar sensor does not provide output data in this format, or if several lidar sweeps are combined, the lidar data may need to undergo preliminary processing of the type to be described below. When using a lidar sensor which is operable to output both range information and intensity information, the height map is primarily to be created on the basis of the range information. The height map may be represented as a point cloud.
According to a second aspect of the invention, there is provided a navigation system comprising: a communication interface for acquiring a lidar height map; a memory adapted for storing a predefined map image; first processing circuitry configured to determine an optical flow field, which relates the lidar height map and the map image; and second processing circuitry configured to compute an ML estimate of the lidar sensor pose on the basis of the determined optical flow field.
According to a third aspect, the invention provides a computer program containing instructions for causing a computer, or the navigation system in particular, to carry out the above method. The computer program may be stored or distributed on a data carrier. As used herein, a “data carrier” may be a transitory data carrier, such as modulated electromagnetic or optical waves, or a non-transitory data carrier. Non-transitory data carriers include volatile and non-volatile memories, such as permanent and non-permanent storages of magnetic, optical or solid-state type. Still within the scope of “data carrier”, such memories may be fixedly mounted or portable.
The first, second and third aspects of the invention generally share the same advantages and can be embodied in analogous ways.
In some embodiments, for example, the optical flow field is determined by a regression model, which additionally produces an associated variability tensor. Then, the ML estimate of the lidar sensor pose is computed further on the basis of the variability tensor. Unlike an a priori computation of the optical flow field, these embodiments may advantageously utilize any visual, topographical etc. similarities between different execution runs, thereby possibly simplifying the computations and rendering them more resilient. In these embodiments, it is furthermore advantageous to implement the regression model as a trained or trainable neural network.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to “a/an/the element, apparatus, component, means, step, etc.” are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
Aspects and embodiments are now described, by way of example, with reference to the accompanying drawings, on which:
The aspects of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, on which certain embodiments of the invention are shown. These aspects may, however, be embodied in many different forms and should not be construed as limiting; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and to fully convey the scope of all aspects of invention to those skilled in the art. Like numbers refer to like elements throughout the description.
The pose determination problem to be addressed by the present invention is conceptually illustrated in
It is noted that
In embodiments of the invention, the images 201, 202 may correspond to the map image and height map. Such embodiments may be restricted to determining the pose correction transform as a rigid movement—rather than a general movement—since it follows from theoretical result that rescaling, shearing and other non-rigid transformations cannot result from a change of pose of a top-view lidar sensor. (Certainly, non-static objects may have entered or left the scene, or moved.) Starting from an optical flow field, which associates image points with local translation vectors, the parameters x, y, ϕ of the pose correction transform can therefore be determined from an overdetermined system of equations. In embodiments targeting the three-dimensional case, the pose correction transform may further include corrections of the height z and a pitch or roll angle ψ. While this determination will be discussed in detail below, the underlying principle can be realized by comparing the simple examples in
Reference is now made to
As
The left part of
The transformation illustrated in
-
- an acquisition 314 of a lidar height map;
- a determination 322 of an optical flow field, which relates the lidar height map and the map image; and
- a computation 324 of an ML estimate of the lidar sensor pose on the basis of the determined optical flow field.
The method 300 solves the localization problem by first estimating an optical flow field between the sensor and map coordinate frames, and by then using the flow field to compute the sensor location, i.e., to estimate the relation to the prior location (see below) in terms of translation and angle(s). Specifically, the sensor and map cloud data may be discretized into 2D grids in top-view perspective with a features vector for each grid cell. In one embodiment, a neural network is used to regress the optical flow field, i.e., a set of 2D vectors that estimate the translation of the center of each grid cell in the sensor image into the map coordinate frame.
It is assumed that the height map is available as a point cloud. If a 2D optical flow field is to be used, it is moreover assumed that the vertical direction of the sensor is known, so that its data can be transformed into a coordinate system whose vertical axis is aligned with the gravitational axis. Furthermore, a prior on the sensor pose, which is accurate to approximately 20 m and 20° heading angle, is obtained. Normally, such a prior is available from satellite-based localization, or from inertial odometry based on a previous localization. This corresponds to a step 310 of the method 300. The prior position defines the center point of the area of the map which is to be extracted, in step 312, for feature image construction.
Aspects of the height map acquisition 314, including suitable processing of the raw sensor data, have been described with reference to
To cast the problem into an optical flow formulation, the input points from sensor and map are transformed into suitable coordinate systems, such as cartesian systems. Using the prior information of the sensor's vertical axis and its heading, a transform operator TES is defined which rotates points expressed in the sensor coordinate frame S to the error frame E, which is aligned with the map axes, but with a remaining error in heading due to the prior error. The map points are extracted from an area of the map centered at the prior position, and then translated from the map coordinate frame M to the crop frame C with its origin at the prior position, by defining an operator TMC and applying its inverse on the extracted points. The sought sensor pose transform TMS, relative to the map coordinate frame M, can be computed as the composition
TMS=TMCTCETES,
where TCE is the to-be-computed pose correction transform that aligns the rotated sensor points with the translated map crop points.
The transformed point sets are partitioned into 2D grids in the horizontal plane, where the sensor grid contains Ws×Hs cells. The map grid is of a larger Wm×Hm size, so as to support flow vector end points outside the sensor grid borders. For each grid cell, a feature vector is computed. As an example, the feature vector may be defined as x=[n,{tilde over (z)},σ]T, in which n is the number of points contained in the cell, {tilde over (z)} represents the mean height of the points, and σ is the standard deviation the points' vertical coordinates. This information is collected in a sensor input tensor EXs and a map input tensor CXm.
An example of the determination 322 of the optical flow field in the 2D case will be discussed next.
For a given resolution level 1, the corresponding flow field regressor function ƒ(l), defined as (FCE, θΣ)=ƒ(l)(EXs, CXm), is a neural network that outputs a 2×W(l)×H(l) flow field tensor FCE and a 3×W(l)×H(l) flow covariance parameter tensor θΣ (variability tensor). Each spatial grid cell in the output tensors is enumera-ted with the index i∈[1, N(l)], where N(l)=W(l)H(l). This index is used to denote ƒi, the flow vectors from each grid cell of FCE, θi, the parameters of the covariance matrix for each flow vector, and pi, the grid cell center point. The neural network is trained with ground truth flow fields Fgt using a log-likelihood loss given by
where parameters θi of covariance matrices Σ(θi) at each grid cell are regression variables, and the dependence on the resolution l is implicit.
The neural network that defines the regressor function ƒ(l)(EXs, CXm) may be structured as illustrated in
The encoders 802 may use a U-Net-like structure (see O. Ronneberger et al., “U-Net: Convolutional networks for biomedical image segmentation”, in: International Conference on Medical image computing and computer-assisted intervention, Lecture Notes in Computer Science, Springer, vol. 9351 (2015), pp. 234-241). The U-Net structure may include skip connections to encode the sparse inputs into feature maps with large receptive fields, as will be further discussed with reference to step 320. The network has one down-sampling chain that applies 3×3 two-dimensional convolutions in six groups of three convolutional layers each. Each group halves the spatial dimensions of the tensor. The chain is followed by an up-sampling chain with the same structure as the down-sampling chain, but each group has a skip connection input from the down-sampling chain. The up-sampling chain contains groups up to the spatial dimension determined by the multi-level localization procedure.
The correlation module computes 803 the scalar products of the feature vector at each location in the encoded sensor feature tensor and feature vectors from a set of neighboring locations around the same position in the map image. To accommodate neighbor locations outside the sensor image borders, the map data image is extracted from a larger area such that it fits all neighbors. The operation results in a vector of scalar products per location in the sensor feature image, where each component is associated with a neighborhood location.
The correlation volume is input into the flow field regressor network 804, which may have a base of five 3×3 two-dimensional convolution layers. The base is followed by one branch of three convolution layers for regressing flow field FCE, and another branch of four layers for regressing covariance parameters tensor θΣ.
A pose computation module 805 produces the end result on the basis of the flow field.
As an alternative to this structure 800, the implementation of the regressor function ƒ(l)(EXs, CXm) may include a deep neural network with a different number of layers than discussed here. It has already been mentioned that the computation of the optical flow field may use a general regression model. It is not necessary for the regression model to be a neural network, or to be machine-learning based at all. The optical flow field may also be computed by a direct method not relying on training or modeling; nor is it necessary for this computation to produce a variability tensor.
Continuing the running 2D example, it will now be described how the computation 324 of the sensor pose from the regressed flow field may be performed as an ML estimation of the pose correction transform TCE. A pose correction transform of this type was exemplified in
hi(TCE)=TCEpi−pi,
which is used to model the regressed flow filed vector as
ƒi=hi(TCE)+ei,
where ei˜(0, Σθ,i) is the flow vector error, modelled with the regressed covariance matrices. Expressed as a probability density, this corresponds to
p(ƒi|TCE)=(ƒi;hi(TCE),Σθ,i).
Under the assumption that flow vectors ƒi are conditionally independent, the distribution of the whole flow field can be described as
While this assumption may not be accurate in the general case, it is practical for the purpose of weighting flow vectors in preparation of the pose computation.
In the 2D case, the error correction transform TCE is parameterized by translation [x, y]T and heading angle ϕ, as indicated in
where μi=μi((ϕ, pi, ƒi) can be evaluated for any given ϕ. A set of M heading angle hypotheses ϕj, j∈[1, M] is sampled from a suitable search range, and all μi,j, i∈[1, N], j∈[1, M] are computed. Then {circumflex over (x)}j, ŷj that maximize log L are computed analytically as per
The ML estimate {circumflex over (x)}, ŷ, {circumflex over (ϕ)} is found by identifying the heading hypothesis ϕj and a corresponding pair {circumflex over (x)}j, ŷj that evaluates to the highest likelihood of all j. Finally, {circumflex over (T)}CE is constructed from the estimated parameters and the sought sensor pose transform is computed as
TMS=TMC{circumflex over (T)}CETES.
This value—or a suitably formatted map position derived from the pose transform TMS—may then be output 332.
If the optical flow field was computed by an algorithm that does not produce a variability measure, the variability tensor Σθ,i may be set to unity. Experimental results for a validation set suggest that the improvements obtained by using actual values of the variability are sometimes moderate. In the general case, and particularly if sparsening pre-processing is applied (see step 320 below), the availability of covariance or another variability measure may be of significant value.
The method 300 may be repeated as often as necessary to provide a fresh lidar sensor pose estimate. For instance, the repetition may be initiated after a predetermined delay, when a movement of the lidar sensor has been detected or in response to some other triggering event.
Multi-scale localization is an optional further development of the method 300. Then, to overcome issues with the limited search space connected with the use of a correlation volume—and with only a limited impact on computational performance—a coarse-to-fine approach may be used to successively resolve the flow field in a pyramidal process. Since the flow field is expected to follow a rigid transform, the pose may be estimated in each iteration, and the next iteration's resolution is increased relative to the current iteration. In situations when the prior pose is precise, it may be sufficient to compute only the finest resolution flow. For occasional re-locating, however, the coarser localization levels can be applied initially to bootstrap the error. In practise this means that multiple versions of the localization pipeline in
Returning to
Alternatively or additionally, the method 300 may include a step 316 of augmenting the lidar height map with lidar intensity information, wherein the augmented (or enriched) height map replaces the height map in the subsequent processing steps. This is to say, the optical flow field is determined on the basis of the augmented height map and the map image. The intensity information may be acquired from the same lidar sensor which produces the range information from which the height map is established. This may allow a more precise feature vector correlation estimate and thus produce more accurate optical flow field computation.
Alternatively or additionally, the method 300 may include a pre-processing step 318, where the lidar height map and the map image are processed into respective feature images (cf. feature vector x introduced above). The pre-processing may be implemented by the feature encoder neural networks 802 in
Alternatively or additionally, the described method 300 is combined with statistical temporal filtering, such as Kalman filtering or particle filtering. Repeated ML estimates of the lidar sensor pose—or map positions derived from these—may be the sole data source (observation, measurement) provided to the Kalman filter. Alternatively, the lidar sensor pose may be combined with other data sources such as GNSS or dead reckoning. The data sources which can be relied upon as sensor pose priors are generally useful as inputs to the Kalman filtering as well. The combining with a Kalman filter may improve the stability of the estimated position and may lessen the impact of noise.
The neural networks in
The dataset includes natural occlusions, namely, that the proximal map contains data that is not seen in sensor measurements, due to objects blocking the line of sight. Thus, the algorithm discussed above is implicitly trained to manage such occlusions, and the following evaluations test the algorithm's performance in partially occluded scenes. The opposite scenario, where the measurement scans contain data from objects that are not in the map, is not included in the dataset.
The CARLA-based training data was infinitely augmented by rotation, such that both the map image and sensor points of each sample were rotated randomly in the horizontal plane. This was found to be necessary to avoid overfitting, since the included CARLA worlds have a strong emphasis on straight roads, features or buildings in north-south or east-west orientations. For training optimization, ADAM with its standard parameters was used; see D. P. Kingma et al., “ADAM: A method for stochastic optimization”, arXiv:1412.6980. The step size was fixed at 0.0003. At cold start, it turned out necessary to use a regular Li loss function to find a starting point of non-trivial features.
The present invention may further be embodied as a navigation system with an ability (e.g., by comprising corresponding interfaces) to acquire a lidar height map from a lidar sensor, either directly or through some intermediary, and to retrieve the predefined map image from an internal or shared memory. The navigation system may further include first processing circuitry configured to determine an optical flow field, which relates the lidar height map and the map image, and second processing circuitry configured to compute an ML estimate of the lidar sensor pose on the basis of the determined optical flow field. The portion of
The aspects of the present disclosure have mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the invention, as defined by the appended patent claims. For example, the generalization of the present techniques to higher dimensions, wherein three- or higher-dimensional generalized optical flow may be used, remains within the scope of the present invention.
Claims
1. A method for determining a lidar sensor pose with respect to a predefined map image, comprising:
- acquiring, by a computing device, a lidar height map;
- determining, by the computing device, an optical flow field, which relates the lidar height map and the map image; and
- computing, by the computing device, a maximum-likelihood (ML) estimate of the lidar sensor pose on the basis of the determined optical flow field.
2. The method of claim 1, wherein the computing of the ML estimate of the lidar sensor pose includes maximizing a likelihood of a candidate error correction transform given the determined optical flow field.
3. The method of claim 1, further comprising:
- augmenting the lidar height map with lidar intensity information,
- wherein the optical flow field is determined on the basis of the augmented height map and the map image.
4. The method of claim 1, wherein the optical flow field is a two-dimensional optical flow.
5. The method of claim 1, wherein the optical flow field is a three-dimensional generalized optical flow.
6. The method of claim 1, further comprising:
- pre-processing the lidar height map and the map image into respective feature images; and
- sparsening the feature image of the map image,
- wherein the optical flow field is detected on the basis of the respective features images.
7. The method of claim 1, further comprising:
- initially obtaining a coarse global localization; and
- extracting the map image as a subarea of a larger predetermined map image.
8. The method of claim 1, wherein:
- the optical flow field is determined by a regression model, which additionally produces an associated variability tensor; and
- the ML estimate of the lidar sensor pose is computed further on the basis of the variability tensor.
9. The method of claim 1, wherein the regression model is implemented by a trained neural network.
10. The method of claim 1, further comprising repeating the steps of determining an optical flow field and computing an ML estimate of the lidar sensor pose, together with optional further steps, at an increased spatial resolution and applying the estimated lidar sensor pose as a prior.
11. A navigation system comprising:
- a communication interface for acquiring a lidar height map;
- a memory adapted for storing a predefined map image;
- first processing circuitry configured to determine an optical flow field, which relates the lidar height map and the map image; and
- second processing circuitry configured to compute a maximum-likelihood (ML) estimate of the lidar sensor pose on the basis of the determined optical flow field.
12. The navigation system of claim 11, wherein the first processing circuitry implements a regression model for producing the optical flow field and an associated variability tensor.
13. The navigation system of claim 11, wherein the first processing circuitry includes a trainable component.
14. The navigation system of claim 11, further comprising a Kalman filter configured for position tracking at least partly on the basis of the estimated lidar pose.
15. A computer program product stored on a non-transitory computer-readable storage medium and including instructions to cause a processor device to:
- acquire a lidar height map;
- determine an optical flow field, which relates the lidar height map and the map image; and
- compute a maximum-likelihood (ML) estimate of the lidar sensor pose on the basis of the determined optical flow field.
Type: Application
Filed: Oct 28, 2021
Publication Date: May 19, 2022
Inventor: Anders Sunegård (Göteborg)
Application Number: 17/513,433