Challenges and Technical Solutions to the Construction of Chinese-English Parallel Corpus for Ancient Chinese Painting & Calligraphy
DOI: https://doi.org/10.62381/ACS.HSMS2026.13
Author(s)
Hanmin Zhang1, Qiang Zhang2,*
Affiliation(s)
1Office of Scientific Research, Wuhan City Polytechnic, Wuhan, Hubei, China
2School of Foreign Languages, Central China Normal University, Wuhan, Hubei, China
*Corresponding Author
Abstract
The cross-cultural translation and intelligent understanding of ancient Chinese painting and calligraphy are seriously restricted by the shortage of high-quality domain-specific parallel corpora. The major bottlenecks include cultural semantic gap of professional terminology, scarcity of low-resource parallel data, multi-modal text-image misalignment, and lack of standardized annotation specifications. Targeting painting and calligraphy literature of Tang, Song, Yuan, Ming and Qing dynasties, this paper systematically analyzes the key technical difficulties in constructing Chinese-English parallel corpus. A comprehensive technical framework integrating domain knowledge engineering, semi-automated alignment, multi-modal feature fusion and knowledge-enhanced annotation is proposed. The framework optimizes the whole workflow of data collection, intelligent cleaning, text preprocessing and hierarchical annotation, and solves the problems of cultural semantic mismatch and low automation in traditional corpus construction. Quantitative experiments verify that the proposed hybrid alignment and knowledge-enhanced annotation mechanism significantly improve bilingual sentence alignment accuracy and domain terminology consistency. This research provides standardized high-quality data resources for cross-cultural translation, intelligent retrieval and machine learning in ancient Chinese art, and supports the international dissemination of traditional Chinese art heritage.
Keywords
Parallel Corpus; Ancient Chinese Painting and Calligraphy; Cultural Semantic Gap; Multi-Modal Alignment; Knowledge- Enhanced Annotation
References
[1]K. Wang, Design and Development of the Chinese-English Parallel Corpus. Journal of Foreign Language Education, 2013, 9(6): 23-27.
[2]L. Niu, A Multi-dimensional Semantic Model for Aligning Chinese and Western Painting and Calligraphy Terms. Journal of Literature and Data, 2026, 12(2): 45-53.
[3]Y. Li, ArtSeek: Deep Artwork Understanding via Multimodal In-Context Reasoning. arXiv preprint arXiv:2502.07891, 2025.
[4]Y. Liu and D. Xiong, Parallel Corpus Construction for Low-Resource Languages. Computer Science, 2022, 49(5): 112-119.
[5]H. Zhang, Multi-modal Corpus Construction for Cultural Heritage. Frontiers in Digital Humanities, 2025, 12: 1-14.
[6]L. Trajkovic, Benchmarking Machine Translation with Cultural Awareness. Findings of EMNLP, 2024: 891-898.
[7]Q. Gu, Corpus Construction for Chinese Art Terminology. Journal of Chinese Culture, 2025, 18(3): 67-75.
[8]Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, et al., No Language Left Behind: Scaling Human- centered Machine Translation. arXiv preprint arXiv:2207.04672, 2022.
[9]ALIGN Team, Word Association Learning for Cultural Alignment in Large Language Models. arXiv preprint arXiv:2508.13421, 2025.
[10]Lulu Zhou, Simone Conia, Daniel Lee, Min Li, Umar Farooq Minhas, Saloni Potdar, Yunyao Li, et al., KG-MT: Integrating Multilingual Knowledge Graphs into Cross-Cultural Translation. EMNLP, 2024: 5678-5686.