Overview

AerialFormer is a semantic segmentation model for top-down aerial imagery. It uses Transformer features at the contracting path and lightweight multi-dilated convolutional components at the expanding path to combine global context with high-resolution spatial detail.

Research Context

Aerial image segmentation faces strong class imbalance, complex backgrounds, intra-class heterogeneity, inter-class homogeneity, and tiny objects. The project targets these constraints through a hierarchical multi-resolution encoder-decoder design.