MicroFlow: Domain-Specific Optical Flow for Ground Deformation Estimation in Seismic Events

Juliette Bertrand1
Sophie Giffard-Roisin2
James Hollingsworth2
Julien Mairal1
1Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK
2Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, IRD, Univ. Gustave Eiffel, ISTerre, 38000 Grenoble
Paper
GitHub
Dense ground displacement measurements are crucial for geological studies but are impractical to collect directly. Instead, displacement fields are estimated using optical satellite images from different acquisition times. Going from (a) high-resolution to (b)-(d) medium-resolution remote sensing optical images acquired before and after real seismic events, this task involves small to very small displacements, requiring sub-pixel precision for accurate estimates and robustness to high temporal variations from the image acquisition process.
Traditionnaly designed to achieve subpixel precision, classical models (1-2) are overly sensitive to temporal change, while data-driven models (3-4) using non iterative refinements (3) or correlation-based backbones (4) fail to achieve subpixel precision.

MicroFlow (5) achieves subpixel precision, preserves fault-line sharpness and improves the robustness to external perturbations of the signal such as (c) geological activities and (d) human activities, among others.

Method

Correlation-independent backbone

We estimate the displacement field from the input images, using a modified U-Net encoder-decoder network \(g\) tailored for displacement field estimation. Encoder-decoder networks traditionally rely on correlation-dependent encoders (b) where the input images are processed independently, through an image encoder \(e_I\) in a siamese fashion before being fed to a correlation layer followed by a correlation encoder \(e_C\). Instead, we use a correlation-independent encoder (a) where the input images are processed jointly by an image-pair encoder \(e_{P}\). Using a correlation-independent encoder is necessary to achieve subpixel precision.

Iterative refinements with explicit warping and weighted loss

Instead of estimating the displacement field in a single step, we iteratively refine it, by computing updates \(\mathbf{\Delta df_i} = g_{i}(x_i)\) as \(df_i = df_{i-1} + \mathbf{\Delta df_i}\). The refined field \(df_i\) is used to warp \(I_2\) via a Spatial Transformer Network, producing \(I_2^i\), which is stacked with \(I_1\) for the next iteration. Networks \((g_1, ..., g_n)\) can either have separate weights (\(g_1 \neq g_2 \neq g_n\)) or share weights (\(g_1 = g_2 = g_n = g\)). In both cases, parameters are optimized end to end. The model is trained using a weighted loss instead of the standard \(L_1\) loss on the final displacement field: \[ \mathcal{L} = \sum_{i=1}^{n} \mathcal{L}_{i} = \sum_{i=1}^{n} \gamma^{n-i} |df_{gt} - df_i| \] where \(\gamma\) is an attenuation factor, prioritizing later predictions. This loss enforces predicting a good estimate at each iteration instead of shifting at each iterations and compensating with the last iteration model.

A posteriori regularization

Estimating the dense ground displacement field \( df_{\text{earth}} \) requires inferring the mapping between points on the Earth’s surface, which doesn't translate exactly to a mapping between pixel intensities in the images, due to the acquisition gaps causing scene perturbations. Data-driven methods leverage priors to estimate \( df_{\text{earth}} \), but real-world annotation scarcity and temporal variability introduce noise, yielding \( df_{\text{noisy}} \). The challenge is then to recover \( df_{\text{earth}} \) from \( df_{\text{noisy}} \), posing an inverse problem. We frame it as an optimization task aiming to denoise the field while preserving spatial smoothness: \[ df_{\text{denoised}} = \arg \min_{df \in \mathbb{R}^{H \times W \times 2}} \left( \|df - df_{\text{noisy}}\|_2^2 + \lambda \psi(df) \right) \] And use a non-convex penalty \(\psi(u) = \log(|\nabla u| + \epsilon) \), which encourages smoothness away from the fault while preserving the sharpness of the fault-line.

Training on semi-synthetic data

With limited manually annotated data, quantitatively assessing model performance on real data remains challenging. We first rely on semi-synthetic benchmarks on FaultDeform, though imperfect, but offering valuable insights and allowing quantitative comparisons. We use the standard average point error metric (EPE) to quantify photometric accuracy and introduce a smoothness metric (\(||{\nabla f} ||_2\)) to assess noise levels in near-fault and non-fault regions, ensuring adherence to physical constraints. Our proposed method MicroFlow shows significant improvement over two widely used methods in the geoscience community, COSI-Corr and MicMac.


Inference on real-world examples

We also perform a comprehensive qualitative evaluation using real-world pre- and post-seismic images from the Ridgecrest earthquake region, captured by medium- and high-resolution sensors. Our results show that MicroFlow achieves photometric accuracy across small and very small displacement ranges—a first for a deep learning-based optical flow model.

COSI-Corr vs MicroFlow,
EW direction

COSI-Corr vs MicroFlow,
NS direction

With vs without LTV,
EW direction

With vs without LTV,
NS direction


Paper and Supplementary Material

J. Bertrand, S. Giffard-Roisin, J. Hollingsworth, J. Mairal.
MicroFlow: Domain-Specific Optical Flow for Ground Deformation Estimation in Seismic Events.
(hosted on ArXiv)




Acknowledgements

The authors would like to sincerely thank Tristan Montagnon for his helpful discussions. This project was supported by ANR 3IA MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and by ERC grant number 101087696 (APHELEIA project). This work was granted access to the HPC resources of IDRIS under the allocation [AD011015113, AD010115035] made by GENCI. This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.