Robust Reconstruction of Indoor Scenes

The purpose of this toolbox is to evaluate global registration algorithms in the context of scene reconstruction. Local fragments are first constructed from short term segments of RGB-D scans using RGB-D odometry. Each fragment is represented as a point cloud $\mathbf{P}_{i}$. Its pose is denoted as a 4 × 4 transformation matrix $\mathbf{T}_{i}$. Consider a pair of non-consecutive fragments $(\mathbf{P}_{i}, \mathbf{P}_{j})_{(i+1 \lt j)}$. They may overlap in space and create a loop closure constraint although the transformation that aligns them cannot be directly computed from the odometry. Therefore, we need a global registration algorithm that tests non-consecutive fragment pairs and computes a transformation $\mathbf{T}_{ji}$ that aligns fragment $\mathbf{P}_{j}$ to fragment $\mathbf{P}_{i}$ if the alignment yields sufficient overlap.

This toolbox is designed to evaluate such global registration algorithms. An algorithm is required to take a set of fragments as input, and test all pairs of non-consecutive fragments $(\mathbf{P}_{i}, \mathbf{P}_{j})_{(i+1 \lt j)}$ — consecutive fragment pairs are not considered in this evaluation. If the registration algorithm decides that a pair of fragments can be aligned (our guideline is to have at least 30% overlap), it needs to output a transformation matrix $\mathbf{T}_{ji}$ to be evaluated. All the transformation matrices of successfully aligned fragment pairs are stored in a .log file (see the File Format section). The three numbers in the metadata of transformation matrix $\mathbf{T}_{ji}$ are: fragment index $i$, fragment index $j$, and the total number of fragments.

An example output of a global registration algorithm is as follows:

4   6   57
 0.931813  -0.151833   0.329652   1.2445
-0.0433474  0.855229   0.516434  -0.78035
-0.36034   -0.49551    0.790332   0.711526
 0         -0         -0          1
7   10  57
-0.0657651 -0.0242835  0.99754    1.44585
-0.983248   0.171894  -0.0606383  3.72015
-0.169998  -0.984816  -0.0351812  3.10725
 0          0          0          1
1   3   57
 0.554312   0.236264  -0.798071   1.46312
-0.2289     0.965163   0.126745   0.205479
 0.800214   0.112423   0.589083  -1.43515
 -0        -0         -0          1
7   11  57
 0.977749  -0.0522724  0.20316   -0.0895188
 0.202217  -0.0228047 -0.979075   2.13087
 0.0558116  0.998373  -0.011727  -1.28311
 0          0          0          1
......

Here are the fragments from four sequences that should be used for testing:

Scene	Fragments (PLY or PCD)		Evaluation File	Source
Living Room 1	PLY (390M)	PCD (394M)	ZIP (1M)	Dataset
Living Room 2	PLY (332M)	PCD (336M)	ZIP (1M)
Office 1	PLY (282M)	PCD (286M)	ZIP (1M)
Office 2	PLY (330M)	PCD (334M)	ZIP (1M)

Code

For an example of complete evaluation on all four sequences refer to https://github.com/qianyizh/ElasticReconstruction/blob/master/Matlab_Toolbox/Example/demo_mrRegistrationEvaluation.m

The following Matlab script evaluates the registration result stored in "result.log".

gt = mrLoadLog( 'gt.log' );
gt_info = mrLoadInfo( 'gt.info' );
result = mrLoadLog( 'result.log' );
[ recall, precision ] = mrEvaluateRegistration( result, gt, gt_info );

Ground Truth

The ground truth data is created from synthesized data. Local fragments $\{\mathbf{P}_{i}\}$ are constructed within a 3m × 3m × 3m volume with 6mm resolution. The fragments are created from depth images synthesized with a noise model measured on a real PrimeSense camera. The fragments are fused using the ground truth trajectory. Since the synthesized depth images are affected by both noise and intrinsic camera distortion, the constructed fragments do not perfectly represent the true geometry. This is consistent with artifacts present in real data.

Let $\mathbf{T}^{*}_{i}$ be the ground truth pose of fragment $\mathbf{P}_{i}$, derived from the ground truth trajectory. Each pair of non-consecutive fragments $(\mathbf{P}_{i}, \mathbf{P}_{j})_{(i+1 \lt j)}$ is tested using the ground truth transformation $\mathbf{T}^{*}_{ji}=\mathbf{T}^{*-1}_{i}\mathbf{T}^{*}_{j}$ to see if their overlap is larger than 30% of $\min\{|\mathbf{P}_{i}|,|\mathbf{P}_{j}|\}$. To perform the evaluation efficiently, each fragment is uniformly downsampled at 5cm resolution. For each point $\mathbf{q}$ in the downsampled fragment $\mathbf{P}_{j}$, we look for the nearest point of $\mathbf{T}^{*}_{ji}\mathbf{q}$ in the downsampled $\mathbf{P}_{i}$, denoted as $\mathbf{p}$. If the distance between $\mathbf{T}^{*}_{ji}\mathbf{q}$ and $\mathbf{p}$ is less than 7.5cm, point pair $(\mathbf{p},\mathbf{q})$ is considered a ground-truth correspondence and is included in ${\mathcal K}^{*}_{ij}$. We define $(\mathbf{P}_{i}, \mathbf{P}_{j})$ to be a ground-truth loop closure if the size of ${\mathcal K}^{*}_{ij}$ is larger than 30% of the size of downsampled $\mathbf{P}_{i}$ or $\mathbf{P}_{j}$.

For each synthesized scan, we create a set of fragments $\mathbb{P}=\{\mathbf{P}_{i}\}$ and a set of ground-truth loop closures $\mathbb{L}^{*}$. Each loop closure has a ground-truth transformation $\mathbf{T}^{*}_{ji}$ and a ground-truth correspondence set ${\mathcal K}^{*}_{ij}$.

Evaluation

As described in the introduction, to be evaluated, a global registration algorithm needs to output a set of loop closures $\mathbb{L}$ with transformations $\{\mathbf{T}_{ji}\}$. A loop closure is a true positive if and only if it satisfies the following conditions:

It is also a loop closure in $\mathbb{L}^{*}$, and
$\mathbf{T}_{ji}$ brings correspondences in $ {\mathcal K}^{*}_{ij}$ sufficiently close. Specifically, we require the RMSE of the ground truth correspondences to be below a threshold $\tau=0.2$m:

\begin{eqnarray} E^{2}_{RMSE}=\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\|\mathbf{p}^{*}-\mathbf{T}_{ji}\mathbf{q}^{*}\|^{2} \lt \tau^2.\tag{1}\label{eqn:evaluation} \end{eqnarray} For efficiency, we approximately evaluate Equation (\ref{eqn:evaluation}) using the following method. \begin{eqnarray} E^{2}_{RMSE}&=&\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\|\mathbf{p}^{*}-\mathbf{T}_{ji}\mathbf{q}^{*}\|^{2}\tag{2}\\ &\approx&\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\|\mathbf{T}^{*}_{ji}\mathbf{q}^{*}-\mathbf{T}_{ij}\mathbf{q}^{*}\|^{2}\tag{3}\label{eqn:approximate}\\ &=&\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\|\mathbf{q}^{*}-\mathbf{T}^{*-1}_{ji}\mathbf{T}_{ji}\mathbf{q}^{*}\|^{2}.\tag{4}\label{eqn:error} \end{eqnarray} Line (\ref{eqn:approximate}) uses the proximity of correspondences: $(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij} \Rightarrow \| \mathbf{p}^{*} - \mathbf{T}^{*}_{ji} \mathbf{q}^{*} \| \lt \varepsilon$. Let $\xi=(\omega,\mathbf{t})=(\alpha,\beta,\gamma,a,b,c)$ be the standard local parameterization of $\mathbf{T}^{*-1}_{ji}\mathbf{T}_{ji}$ which collates a rotational component $\omega$ and a translation $\mathbf{t}$. Locally, when $\mathbf{T}^{*}_{ji}\approx\mathbf{T}_{ji}$, \begin{eqnarray} \mathbf{T}^{*-1}_{ji}\mathbf{T}_{ji} & \approx & \left(\begin{array}{c c c c} 1 & -\gamma & \beta & a\\ \gamma & 1 & -\alpha & b\\ -\beta & \alpha & 1 & c\\ 0 & 0 & 0 & 1 \end{array}\right),\tag{5}\label{eqn:parameterization}\\ \mathbf{T}^{*-1}_{ji}\mathbf{T}_{ji}\mathbf{q}^{*} & \approx & \mathbf{q}^{*} + \omega\times\mathbf{q}^{*}+\mathbf{t}.\tag{6} \end{eqnarray} Equation (\ref{eqn:error}) can be locally approximated as \begin{eqnarray} E^{2}_{RMSE} & \approx &\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\|\omega\times\mathbf{q}^{*}+\mathbf{t}\|^{2}\tag{7}\\ &=&\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\left\|\left[ -\left[\mathbf{q}^{*}\right]_{\times}\!\mid \mathbf{I} \right]\xi\right\|^{2},\tag{8} \end{eqnarray} where $\left[\mathbf{q}^{*}\right]_{\times}$ is the skew-symmetric matrix form of the cross product with $\mathbf{q}^{*}$, and $\mathbf{I}$ is the 3 × 3 identity matrix. Define $\mathbf{G}_{\mathbf{q}^{*}}=\left[ -\left[\mathbf{q}^{*}\right]_{\times}\!\mid \mathbf{I} \right]$. \begin{eqnarray} E^{2}_{RMSE} & \approx &\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\left\|\mathbf{G}_{\mathbf{q}^{*}}\xi\right\|^{2}\tag{9}\\ &=&\frac{1}{| {\mathcal K}^{*}_{ij}|}\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\xi^{\top}\mathbf{G}_{\mathbf{q}^{*}}^{\top}\mathbf{G}_{\mathbf{q}^{*}}\xi\tag{10}\\ &=&\frac{1}{| {\mathcal K}^{*}_{ij}|}\xi^{\top}\left(\sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\mathbf{G}_{\mathbf{q}^{*}}^{\top}\mathbf{G}_{\mathbf{q}^{*}}\right)\xi.\tag{11}\label{eqn:final} \end{eqnarray} Let \begin{eqnarray} \Lambda^{*}_{ji} = \sum_{(\mathbf{p}^{*},\mathbf{q}^{*})\in {\mathcal K}^{*}_{ij}}\mathbf{G}_{\mathbf{q}^{*}}^{\top}\mathbf{G}_{\mathbf{q}^{*}}.\tag{12} \end{eqnarray} It is invariant to $\mathbf{T}_{ji}$, and can be precomputed as the covariance matrix from the ground truth correspondence set $ {\mathcal K}^{*}_{ij}$. Indeed, we precompute both $\{\mathbf{T}^{*}_{ji}\}$ and $\{\Lambda^{*}_{ji}\}$ and store them in gt.log and gt.info, respectively. We use the approximation in Equation (\ref{eqn:final}) to rapidly perform the evaluation. Refer to https://github.com/qianyizh/ElasticReconstruction/blob/master/Matlab_Toolbox/Core/mrEvaluateRegistration.m. Finally, we compute recall and precision as: \begin{eqnarray} \textrm{Recall} &=& \frac{\textrm{# of true positives}}{\textrm{# of ground truth loop closures}},\tag{13}\\ \textrm{Precision} &=& \frac{\textrm{# of true positives}}{\textrm{# of detected loop closures}}\tag{14}. \end{eqnarray} We are looking for efficient global registration algorithms with high recall. To date, the best performance comes from our variant of Rusu et al., which achieves 59.2% recall and 19.6% precision on our benchmark dataset. Please let us know if your algorithm achieves better results.