Opening the Black Box: Visual Analytics of a Cognition-Guided Multimodal Fusion Model for Classroom Engagement Assessment
DOI: https://doi.org/10.62381/H261305
Author(s)
Min Song, Zhang Wang*, Junyi Chai
Affiliation(s)
Faculty of Information Engineering, College of Science and Technology Ningbo University, Ningbo, China
*Corresponding Author
Abstract
Multimodal deep learning models have achieved high accuracy in automated student engagement assessment, yet their internal mechanisms remain opaque to educators, limiting trust and practical adoption. This paper presents a visual analytics study of a cognition‑guided multimodal fusion model (MCA Fusion) previously validated for high predictive performance. We propose an interpretability framework comprising PCA‑based latent space visualization, cross‑modal attention weight analysis, and case‑level interpretation. Using real classroom data (36 undergraduates, EEG, facial expressions, body posture), we reveal three key insights: (1) the model learns a latent space where engaged and not engaged samples form distinct clusters (silhouette coefficient 0.43 vs. 0.12 for early fusion baseline); (2) evaluated via a Linear Mixed-Effects Model (LMM), the cognition-guided attention mechanism dynamically assigns significantly higher weights to facial and posture features when students are in a verified state of true engagement (p < 0.001), consistent with Fredricks' multidimensional engagement theory; (3) boundary misclassifications occur in genuinely ambiguous situations, with moderate attention weights indicating model uncertainty. These visualizations can inform teacher‑facing dashboards and model refinement. No formal user evaluation is conducted; therefore, practical utility remains to be tested. This work provides a methodological blueprint for interpretable, theory‑aligned multimodal learning analytics.
Keywords
Multimodal Learning Analytics; Student Engagement; Interpretability; Visual Analytics; Attention Mechanism; EEG.
References
[1] I. Possaghi, B. Vesin, F. Zhang, K. Sharma, C. Knudsen, and H. Bjørkum, “Integrating multi ‑ modal learning analytics dashboard in K ‑ 12 education: insights for enhancing orchestration and teacher decision ‑ making,” Smart Learn. Environ, vol. 12, p. 53, 2025, doi: https://doi.org/10.1186/s40561-025-00410-4.
[2] M. Song, I. G. P. Sudiarta, P. K. Nitiasih, P. N. Riastini, Z. Wang, and J. Chai, “Multimodal assessment of student engagement by fusing EEG, facial expressions, and body posture in an offline classroom,” Int. J. Mod. Educ. Comput. Sci., p. in press, 2026.
[3] H. Khosravi, S. Buckingham, G. Chen, and C. Conati, “Explainable Artificial Intelligence in education,” Comput. Educ. Artif. Intell., vol. 3, no. March, p. 100074, 2022, doi: 10.1016/j.caeai.2022.100074.
[4] A. B. Arrieta et al., “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI,” Inf. Fusion, vol. 58, pp. 82–115, 2020, doi: 10.1016/j.inffus.2019.12.012.
[5] M. Mohammadi, E. Tajik, R. Martinez-maldonado, S. Sadiq, W. Tomaszewski, and H. Khosravi, “Artificial intelligence in multimodal learning analytics: A systematic literature review,” Comput. Educ. Artif. Intell., vol. 8, no. May, p. 100426, 2025, doi: 10.1016/j.caeai.2025.100426.
[6] E. Fan, M. Bower, and J. Siemon, “From heartbeats to actions : Multimodal learning analytics of cognitive and behavior engagement in real classrooms,” Learn. Instr., vol. 103, no. January, p. 102325, 2026, doi: 10.1016/j.learninstruc.2026.102325.
[7] J. S. Id, N. S. W. Id, G. M. Mcarthur, and A. B. Id, “A scoping review on the use of consumer- grade EEG devices for research,” PLoS One, vol. 19, no. 3, pp. 1–22, 2024, doi: 10.1371/journal.pone.0291186.
[8] A. L. I. E. Albaiati et al., “Deep Learning Approaches for EEG-Based Biometrics : A Systematic Review,” IEEE Access, vol. 13, no. August, pp. 171025–171047, 2025, doi: 10.1109/ACCESS.2025.3605614.
[9] A. Manoharan, “Multimodal Engagement Recognition From Image Traits Using Deep Learning Techniques,” IEEE Access, vol. 12, no. February, pp. 25228–25244, 2024, doi: 10.1109/ACCESS.2024.3353053.
[10]I. Qarbal, N. Sael, and S. Ouahabi, “Student’s Engagement Detection Based on Computer Vision : A Systematic Literature Review,” IEEE Access, vol. 13, no. August, pp. 140519–140545, 2025, doi: 10.1109/ACCESS.2025.3596885.
[11]S. Sathyanarayanan and B. R. Tantri, “Confusion Matrix-Based Performance Evaluation Metrics,” African J. Biomed. Res., vol. 27, no. 4S, pp. 4023–4031, 2024, doi: 10.53555/AJBR.v27i4S.4345.
[12]I. T. Jolliffe, J. Cadima, and J. Cadima, “Principal component analysis : a review and recent developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, 2026, doi: 10.1098/rsta.2015.0202/1381479/rsta.2015.0202.pdf.
[13]D. Bates, M. Mächler, B. M. Bolker, and S. C. Walker, “Fitting Linear Mixed-Effects Models Using lme4,” J. Stat. Softw., vol. 67, no. 1, pp. 1–48, 2015, doi: 10.18637/jss.v067.i01.
[14]V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, and K. Huang, “Interpreting Black ‑ Box Models : A Review on Explainable Artificial Intelligence,” Cognit. Comput, vol. 16, no. 1, pp. 45–74, 2024, doi: 10.1007/s12559-023-10179-8.
[15]J. A. Fredricks, M. Filsecker, and M. A. Lawson, “Student Engagement, Context, and Adjustment: Addressing Definitional, Measurement, and Methodological Issues,” Learn. Instr., vol. 43, pp. 1–4, 2016.
[16]T. Nazaretsky, M. Ariely, M. Cukurova, and G. Alexandron, “Teachers ’ trust in AI- powered educational technology and a professional development program to improve it,” Br. J. Educ. Technol., vol. 53, no. 4, pp. 914–931, 2022, doi: 10.1111/bjet.13232.
[17]K. Holstein, B. M. Mclaren, and V. Aleven, “Co-designing a Real-time Classroom Orchestration Tool to Support,” J. Learn. Anal., vol. 6, no. 2, pp. 27–52, 2019, doi: 10.18608/jla.2019.62.3.
[18]L. Van Der Maaten and G. Hinton, “Visualizing Data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008, doi: 10.5555/1390681.1440288.