Accurate building segmentation in unmanned aerial vehicle (UAV) orthophotos remains a significant challenge due to the visual similarity between buildings and non-target elements such as trees, roads, and background clutter. This study proposes an enhanced segmentation method—referred to as RGB-DSM-IMP (M3)—which integrates RGB imagery, Digital Surface Model (DSM) data, and a novel background removal preprocessing step. The Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework was employed to evaluate three segmentation strategies: a baseline model using only RGB imagery, a second model combining RGB imagery with DSM data, and the proposed model that incorporates both data types along with preprocessing. All models were trained and tested on drone-acquired images representing a variety of building types and environmental conditions. Performance was evaluated using precision, recall, F1-score, average precision (AP), mean intersection over union (mIoU), and mean average precision (mAP). The enhanced model achieved the highest results across all metrics, with an average F1-score of 0.74, mIoU of 0.74, and mAP of 0.63. These findings highlight the benefit of integrating elevation data to enhance spatial differentiation and demonstrate the effectiveness of background removal in reducing misclassifications caused by visually similar objects. In addition, the method maintained a practical inference time per image, supporting its real-world applicability. Overall, the study demonstrates that combining height-based information with strategic preprocessing significantly improves the accuracy and robustness of building segmentation in complex aerial imagery.
Khiewwan, K. ., Asavasuthirakul, D. ., & Chimlek, S. . (2025). Integrating RGB and DSM Data for Enhanced Building Segmentation in UAV Images. Current Applied Science and Technology, e0265709. https://doi.org/10.55003/cast.2025.265709


https://cast.kmitl.ac.th/doi/10.55003/cast.2025.265709