A Study on an Improved Inception Architecture Integrating SE Attention Mechanism for Image Classification
DOI: https://doi.org/10.62381/I255206
Author(s)
Zihao Yan, Yubo Liu, Yang Zhang, Zhengxin Liu*, Jingjing Hou, Cewu Hang
Affiliation(s)
Xi’an Mingde Institute of Technology, Xi’an, Shaanxi, China
*Corresponding Author
Abstract
Image classification, as one of the core tasks in computer vision, plays a vital role in many real-world applications such as medical diagnosis, autonomous driving, security surveillance, and industrial inspection. With the development of deep neural networks, the Inception module has become a widely adopted structural unit in image classification networks due to its multi-scale convolutional branches, which offer strong feature extraction capabilities and high computational efficiency. However, traditional Inception structures often suffer from insufficient focus on discriminative regions and redundant features when dealing with complex images, limiting further improvements in classification performance. To address these issues, this paper proposes an improved Inception architecture that integrates the Squeeze-and-Excitation (SE) channel attention mechanism. Based on this enhanced module, a lightweight image classification network named FruitNet is designed. By incorporating SE modules after each level of the Inception modules, the network dynamically recalibrates key channel features, thereby enhancing the model's discriminative power and robustness. To validate the effectiveness of the proposed approach, extensive experiments were conducted on the standard image classification dataset CIFAR-10. The results show that FruitNet significantly improves classification accuracy while maintaining low computational complexity, achieving a well-balanced trade-off between lightweight model design and feature representation capability. This study provides a novel structural perspective and technical support for the design and deployment of efficient image classification models.
Keywords
Image Classification; Inception Module; Squeeze-and-Excitation (SE); Mechanism Lightweight Network; FruitNet
References
[1]Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.
[2]Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84-90.
[3]Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2818-2826.
[4]Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition// International Conference on Learning Representations. San Diego, CA, USA: ICLR, 2015: 1-14.
[5]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.
[6]Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 4700-4708.
[7]Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[8]Hu J, Shen L, Sun G. Squeeze-and-excitation networks// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 7132-7141.