MA-ViT: An Ore Classification Method Based on Attention Mechanism and Class-Balanced Learning
DOI: https://doi.org/10.62381/I255C01
Author(s)
Ziyang Qin1, Wanwan Wang2
Affiliation(s)
1School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, Henan, China
2iFLYTEK Co., Ltd., Hefei, Anhui, China
Abstract
Fine-grained classification of ores is of significant importance for geological exploration and mineral processing, whereas traditional manual identification methods are inefficient and heavily reliant on expert experience. Existing deep learning-based identification methods often face challenges such as insufficient texture feature extraction and high miss rates for minority classes when dealing with complex field backgrounds and severely imbalanced sample distributions. To address these issues, this paper proposes an ore classification model based on a mixed attention mechanism and class-balanced learning—MA-ViT (Mixed-Attention Vision Transformer). This method utilizes the Vision Transformer (ViT) as the backbone network. First, it introduces the Convolutional Block Attention Module (CBAM) to effectively suppress background noise and focus on key ore texture features. Second, it designs a class-weighted loss function to mitigate model bias caused by data imbalance. Experimental results on an image dataset containing seven types of ores show that, when comparing MA-ViT with mainstream models such as ResNet-50 and the original ViT, MA-ViT achieves an overall accuracy of 95.61% and a macro-averaged F1 score of 0.9435, outperforming current mainstream models. In particular, for the sample-scarce "Muscovite" category, the recall rate increased significantly from 80.00% in the baseline model to 91.43%, achieving a balance between high precision and high recall. The method proposed in this paper demonstrates strong robustness and generalization ability, providing an effective reference for automatic ore identification in complex environments.
Keywords
Fine-grained Classification; Attention Mechanism; Class-balanced Learning; Ore Identification; Vision Transformer
References
[1] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. CoRR, 2015, abs/1512.03385.
[2] Deng T, Yu Y. Research on Ore Recognition and Classification Based on Improved PSO-Faster R-CNN Algorithm. Mining Research and Development, 2021, 41 (02): 178-182.
[3] Xiao C, Li Q, Li H, et al.Ore Type Detection Algorithm Based on Improved Mask R-CNN. Sintering and Pelletizing, 2024, 49 (02): 65-73+106.
[4] Zou Y, He J, Xia F, et al. Ore image classification method of dual-energy X-ray based on interactive feature fusion. Nonferrous Metals (Extractive Metallurgy), 2025, 15(06): 990-998.
[5] Gao Y, Lv F, Feng Y. Ore image classification algorithm based on cross-channel fine-grained feature fusion. Computer Engineering and Applications, 2025, 61(10): 214-227.
[6] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[7] Jin Y, Sun J, Luo J, et al. Development and application of grading recognition model for tobacco weather fleck based on Cu-ViT deep learning. Plant Medicine, 2025, 4(06):65-76.
[8] Chen C F, Panda R, Fan Q. Regionvit: Regional-to-local attention for vision transformers. arXiv preprint arXiv:2106.02689, 2021.
[9] Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), 2018: 3-19.
[10]Batista A P A E G, Prati C R, Monard C M. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 2004, 6 (1): 20-29.
[11]Cui Y, Jia M, Lin T, et al. Class-Balanced Loss Based on Effective Number of Samples. CoRR, 2019, abs/1901.05555.