《火灾科学（中英文）》编辑部

唐颖捷,Adeel Akram,张启兴,张永明,王进军.基于视觉Transformer的早期森林火灾烟雾探测研究[J].火灾科学,2025,34(3):170-181.

基于视觉Transformer的早期森林火灾烟雾探测研究

Early detection of forest fire smoke based on visual Transformer

DOI：10.3969/j.issn.1004-5309.2025.03.02

基金项目:国家重点研发计划项目(2021YFC3000300);安徽省重大专项项目(202203a07020017)

作者	单位
唐颖捷	1.中国科学技术大学火灾安全全国重点实验室,合肥,230026
Adeel Akram	1.中国科学技术大学火灾安全全国重点实验室,合肥,230026；2.中国科学技术大学先进技术研究院智能感知与计算实验室,合肥,230031
张启兴^^	1.中国科学技术大学火灾安全全国重点实验室,合肥,230026
张永明	1.中国科学技术大学火灾安全全国重点实验室,合肥,230026
王进军	1.中国科学技术大学火灾安全全国重点实验室,合肥,230026

中文关键词: 森林火灾深度学习视觉Transformer 多层次提取重交融注意力机制

英文关键词:Forest fire Deep learning Vision Transformer Multi-level extraction Reintegration attention

摘要点击次数: 143

全文下载次数: 148

中文摘要:

全球范围内森林火灾频发,给生态环境和社会安全带来严重损害。近年来,基于卷积神经网络(CNN)的方法被广泛应用于森林火灾检测,但这些方法存在着感受野受限、特征提取能力不足等问题。为了解决基于CNN方法存在的不足,并尽早检测到森林火灾,根据烟雾常先于火焰出现的规律,针对森林火灾烟雾进行检测,提出了一个基于视觉Transformer网络的早期森林火灾检测算法(ForestSmoke ViT,简称FSViT)。针对森林火灾烟雾在图像中位置的不定性,将视觉Transformer网络的图像切分方式改为重叠切分分块的方法,为网络提供更多语义信息的同时,增加网络对处于分块边缘像素的理解和捕捉能力;其次,因森林场景可能出现大小不同的烟雾,对输入进行了不同层次的特征提取,以实现对不同大小烟雾目标的感知,改善了在CNN中网络感受野受限的问题;此外,考虑到烟雾是半透明的且有时难以与背景区分,设计了重交融注意力机制,实现对来自相同和不同大小特征图之间的信息交换,以提升网络的特征提取能力。所提出的算法在测试集的准确率达到了95.36%,在小目标测试集的召回率为92.74%,明显优于所有对比网络,更适用于早期森林火灾探测。

英文摘要:

In recent years, methods based on convolutional neural networks (CNNs) have been widely applied to forest fire detection, but these approaches suffer from limitations, such as restricted receptive fields and insufficient feature extraction capabilities. To enable earlier detection of forest fires, this study leverages the principle that smoke often precedes flames in forests. We propose an early forest fire detection algorithm based on a visual Transformer network (ForestSmoke ViT, abbreviated as FSViT). Considering the uncertainty of smoke location in images, the image segmentation approach of the visual Transformer network is modified to an overlapping segmentation method. This provides the network with richer semantic information while enhancing its ability to understand and capture pixels at the edges of segmentation blocks. Second, the smoke of forest fires varies in size. Multi-level extraction is applied to the input to capture features, enabling the detection of smoke targets in different scales and mitigating the limited receptive field issue inherent in CNNs. In addition, since smoke is semi-transparent and occasionally indistinguishable from the background, the Reintegration attention mechanism is designed. This facilitates information exchange between feature maps of identical and differing sizes, thereby enhancing the network's feature extraction capability. The proposed algorithm achieved an accuracy of 95.36% on the test set and a recall rate of 92.74% on the small-object test set, significantly outperforming all comparison networks and demonstrating superior suitability for early-stage forest fire detection.

关闭