Construction and Sharing Mechanism of Bioinformatics Databases: Data Governance Strategies for Cross-Species Comparative Genomics
DOI: https://doi.org/10.62381/ACS.FSSD2025.07
Author(s)
Yu Ding
Affiliation(s)
Jintan No.1 High School, Changzhou, China
Abstract
With the rapid development of high-throughput sequencing technology, the volume of bioinformatics data has shown explosive growth, and cross-species comparative genomics has become an important means to analyze the laws of life evolution, gene functions and disease mechanisms. However, data heterogeneity, privacy protection requirements and technical barriers have led to a serious phenomenon of data silos, restricting scientific research collaboration and innovation. This paper systematically explores the core issues of cross-species comparative genomics data governance from three dimensions: database architecture design, data governance strategies, and innovation of sharing mechanisms. It proposes solutions based on modular design, distributed storage, and dynamic governance frameworks, and verifies their feasibility by combining global typical cases, providing theoretical support for biomedical data integration and open science.
Keywords
Bioinformatics Database; Cross-Species Comparative Genomics; Data Governance; Distributed Storage; Privacy Protection
References
[1] Liao, W. W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., ... & Paten, B. (2023). A draft human pangenome reference. Nature, 617(7960), 312-324.
[2] Gupta, P. K. (2022). Earth Biogenome Project: present status and future plans. Trends in Genetics, 38(8), 811-820.
[3] Lewin, H. A., Richards, S., Lieberman Aiden, E., Allende, M. L., Archibald, J. M., Bálint, M., ... & Zhang, G. (2022). The Earth Biogenome Project 2020: Starting the clock. Proceedings of the National Academy of Sciences, 119(4), e2115635118.
[4] Bonomi, L., Huang, Y., & Ohno-Machado, L. (2020). Privacy challenges and research opportunities for genomic data sharing. Nature genetics, 52(7), 646-654.
[5] Martinez-Morales, J. R. (2016). Toward understanding the evolution of vertebrate gene regulatory networks: comparative genomics and epigenomic approaches. Briefings in Functional Genomics, 15(4), 315-321.
[6] Stephens, Z. D., Lee, S. Y., & Faghri, F. (2015). Big Data: Astronomical or Genomical?. PLoS Biology. PLoS Biology, 13(7), e1002195.
[7] Modi, A., Vai, S., Caramelli, D., & Lari, M. (2021). The Illumina sequencing protocol and the NovaSeq 6000 system. In Bacterial pangenomics: methods and protocols (pp. 15-42). New York, NY: Springer US.
[8] Bryce, A. H., Egan, J. B., Borad, M. J., Stewart, A. K., Nowakowski, G. S., Chanan-Khan, A., ... & McWilliams, R. R. (2017). Experience with precision genomics and tumor board, indicates frequent target identification, but barriers to delivery. Oncotarget, 8(16), 27145.
[9] European Bioinformatics Institute: Birney Ewan 3 Goldman Nick 3 Kasprzyk Arkadiusz 3 Mongin Emmanuel 3 Rust Alistair G. 3 Slater Guy 3 Stabenau Arne 3 Ureta-Vidal Abel 3 Whelan Simon 3, Research Group in Biomedical Informatics Abril Josep F. 5 Guigó Roderic 5 Parra Genís 5, Bioinformatics Agarwal Pankaj 6, National Center for Biotechnology Information Agarwala Richa 7 Church Deanna M. 7 Hlavina Wratko 7 Maglott Donna R. 7 Sapojnikov Victor 7, Department of Mathematics Alexandersson Marina 8 Pachter Lior 8, Division of Medical Genetics Antonarakis Stylianos E. 9 Dermitzakis Emmanouil T. 9 Reymond Alexandre 9 Ucla Catherine 9, ... & Department of Biology Lander Eric S. lander@ genome. wi. mit. edu 2 46 b. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915), 520-562.
[10] Paten, B., Novak, A. M., Eizenga, J. M., & Garrison, E. (2017). Genome graphs and the evolution of genome inference. Genome research, 27(5), 665-676.