华东师范大学学报(自然科学版) ›› 2025, Vol. 2025 ›› Issue (4): 49-60.doi: 10.3969/j.issn.1000-5641.2025.04.005

• • 上一篇    下一篇

C-T Net: 融合CNN和Transformer的遥感图像变化检测模型

武一1,2(), 贠世林1   

  1. 1. 河北工业大学 电子信息工程学院, 天津 300401
    2. 河北工业大学 电子与通信工程国家级实验教学示范中心, 天津 300401
  • 收稿日期:2023-10-13 接受日期:2024-04-19 出版日期:2025-07-25 发布日期:2025-07-19
  • 作者简介:武 一, 女, 教授, 研究方向为智能控制系统研究与应用. E-mail: wuyihbgydx@163.com
  • 基金资助:
    国家自然科学基金(51977059); 河北省自然科学基金(E2020202042)

C-T Net: Remote sensing image change detection model integrating CNN and Transformer

Yi WU1,2(), Shilin YUN1   

  1. 1. School of Electronic Information Engineering, Hebei University of Technology, Tianjin 300401, China
    2. Electronics and Communication Engineering National Experimental Teaching Demonstration Center, Hebei University of Technology, Tianjin 300401, China
  • Received:2023-10-13 Accepted:2024-04-19 Online:2025-07-25 Published:2025-07-19

摘要:

双时相遥感图像由于拍摄时间、角度和传感器等因素会产生各种伪变化, 同时存在一些不感兴趣的变化, 变化的位置通常与周边其他物体相关, 采用全卷积神经网络会丢失长程信息. 针对这一问题, 提出了一种融合CNN (Convolutional Neural Networks) 和Transformer的网络 (C-T Net)模型. 整体网络结构分为深度特征提取部分和检测头部分: 网络主干部分将CNN和Swin Transformer相结合, 设计融合模块C-to-T、T-to-C以聚合信息; 检测头部分利用Transformer编码、解码, 获得精细化的特征图以进行变化区域的判别. 与多个变化检测模型相比, 在LEVIR-CD数据集和WHU-CD数据集上F1_1 (90.63%、86.24%) 和$ {p}_{\mathrm{I}\mathrm{o}\mathrm{U}} \_1$(82.87%、75.81%) 均为最高, 在两个数据集上的结果表明, 无论是可视化结果还是数据指标, 与现有的方法相比, 该模型具有一定的优越性.

关键词: 多时相, 变化检测, 卷积神经网络, 转换器, 特征融合

Abstract:

Due to factors such as differences in acquisition time, angle, and sensor characteristics, dual temporal remote sensing images often manifest various pseudo-changes. Moreover, certain changes may have an uninteresting nature and typically correlate with adjacent objects. However, the utilization of a fully convolutional neural network (FCN) may lead to the loss of long-range information. To address this issue, this study proposes a network that integrates convolutional neural networks (CNN) and Transformer (C-T Net), which has an overall network architecture consisting of a deep feature extraction section and a detection head section. The network backbone combines CNN and Swin Transformer. Additionally, two novel fusion modules, C-to-T and T-to-C, are designed to amalgamate local features and global features. The detection head section utilizes Transformer encoding and decoding to derive refined feature maps for discerning change regions. Comparative experiments with multiple change detection models validate the efficacy of C-T Net. On the LEVIR-CD and WHU-CD datasets, the proposed method achieves the highest F1_1 (90.63%, 86.24%) and $ {p}_{\mathrm{I}\mathrm{o}\mathrm{U}} \_1 $(82.87%, 75.81%). Results across both datasets affirm that our proposed algorithm outperforms existing methodologies from both visual and data-centric perspectives.

Key words: multi-temporal, change detection, convolutional neural networks (CNN), transformer, feature fusion

中图分类号: