Authors:
(1) Pavan L. Veluvali, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg;
(2) Jan Heiland, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg;
(3) Peter Benner, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg.
Table of Links
Spinodal decomposition in a binary A-B alloy
Summary and Outlook, Acknowledgments, Data Availability, and References
Summary and Outlook
The practice to perform data and software intensive tasks has been taken hold by computational workflows. Subsequently, the rapid growth in their uptake and application on computer-based experiments presents a crucial opportunity to advance the development of reproducible scientific softwares. As a part of the MaRDI consortium [MaR21] on research data management in mathematical sciences, in this work, we presented a novel computational workflow framework, namely, MaRDIFlow, a prototype that focuses on the automation of abstracting meta-data embedded in an ontology of mathematical objects while negating the underlying execution and environment dependencies into multi-layered vertical descriptions. Additionally, the different components are characterized by their input and output relation such that they can be used interchangeably and in most cases redundantly.
The design specification as well as the working prototype of our RDM tool was presented through different use cases. In the present version, MaRDIFlow acts a command-line tool such that it enables users to handle the workflow components as abstract objects described by input to output behavior. At its core, MaRDIFlow ensures that the output generated is detailed, and a comprehensive description facilitates the reproduction of computational experiments. At first we illustrated the conversion rates of CO2 using a methanization reactor model, and later, we demonstrated the two-dimensional spinodal decomposition of a virtual A-B alloy using the Cahn-Hilliard model. Our RDM tool adheres to FAIR principles, such that the abstracted workflow components are Findable, Accessible, Interoprable and Reusable. Overall, the ongoing development of MaRDIFlow aims at covering heterogeneous use cases and act as a scientific tool in the field of mathematical sciences.
Apart from this, we are also working towards developing an Electronic Lab Notebook (ELN) in order to visualize as well as execute the MaRDIFlow tool. The ELN will provide researchers with a user-friendly interface to interact with the tool efficiently and seamlessly. Lastly, although the present manuscript introduces our RDM tool as a working proof of concept, we plan to publish a detailed manuscript with technical details and use cases in the near future.
Acknowledgments
The authors are supported by NFDI4Cat and MaRDI, funded by the Deutsche Forschungsgemeinschaft (DFG), project 441926934 “NFDI4Cat – NFDI f¨ur Wissenschaften mit Bezug zur Katalyse” and project 460135501 “MaRDI – Mathematische Forschungsdateninitiative”.
Data Availability
Results presented in this work are apart of an ongoing investigation, however a working prototype with the second use-case is available and documented at https://doi.org/10.5281/zenodo.10608764
References
[AGMT17] M. Atkinson, S. Gesing, J. Montagnat, and I. Taylor. Scientific workflows: Past, present and future, 2017.
[BCG+19] A. Brinckman, K. Chard, N. Gaffney, M. Hategan, M. B. Jones, K. Kowalik, S. Kulasekaran, B. Lud¨ascher, B. D. Mecum, J. Nabrzyski, V. Stodden, I. J. Taylor, M. J. Turk, and K. Turner. Computing environments for reproducibility: Capturing the “whole tale”. Future Generation Computer Systems, 94:854–867, 2019.
[BHBS21] J. Bremer, J. Heiland, P. Benner, and K. Sundmacher. Non-intrusive time-pod for optimal control of a fixed-bed reactor for co2 methanation. IFAC-PapersOnLine, 54(3):122–127, 2021.
[BOA+11] T. Blochwitz, M. Otter, M. Arnold, C. Bausch, C. Clauß, H. Elmqvist, A. Junghanns, J. Mauss, M. Monteiro, T. Neidhold, et al. The functional mockup interface for tool independent exchange of simulation models. In Proceedings of the 8th international Modelica conference, pages 105–114. Link¨oping University Press, 2011.
[BTK+21] M. Beg, J. Taka, T. Kluyver, A. Konovalov, M. Ragan-Kelley, NM. Thi´ery, and H. Fangohr. Using jupyter for reproducible scientific workflows. Computing in Science & Engineering, 23(2):36–46, 2021.
[CAI+22a] M. Crusoe, S. Abeln, A. Iosup, P. Amstutz, J. Chilton, N. Tijani´c, H. M´enager, S. Soiland-Reyes, B. Gavrilovi´c, C. Goble, et al. Methods included: Standardizing computational reuse and portability with the common workflow language. Communications of the ACM, 65(6):54–63, 2022.
[CAI+22b] MR. Crusoe, S. Abeln, A. Iosup, P. Amstutz, J. Chilton, N. Tijani´c, H. M´enager, S. Soiland-Reyes, B. Gavrilovi´c, C. Goble, et al. Methods included: Standardizing computational reuse and portability with the common workflow language. Communications of the ACM, 65(6):54–63, 2022.
[CH58] JW. Cahn and JE. Hilliard. Free energy of a nonuniform system. i. interfacial free energy. The Journal of chemical physics, 28(2):258–267, 1958.
[Com22] The Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Research, 50(W1):W345–W351, 04 2022.
[CSFG19] A. Clyburne-Sherin, X. Fei, and SA. Green. Computational reproducibility via containers in social psychology. Meta-Psychology, 3, 2019. [DGST09] E. Deelman, D. Gannon, M. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5):528–540, 2009.
[DHM+20] A. Devaraju, R. Huber, M. Mokrane, P. Herterich, L. Cepinskas, J. de Vries, H. L’Hours, J. Davidson, and Angus W. Fairsfair data object assessment metrics, October 2020.
[FHHS16] J. Fehr, H. Heiland, C. Himpe, and J. Saak. Best practices for replicability, reproducibility and reusability of computer-based experiments exemplified by model reduction software. AIMS Mathematics, 1(3):261–281, 2016.
[For22] Deutsche Forschungsgemeinschaft. Guidelines for Safeguarding Good Research Practice. Code of Conduct, April 2022. Available in German and in English.
[GCBSR+20] C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D. Garijo, Y. Gil, MR. Crusoe, K. Peters, and D. Schober. Fair computational workflows. Data Intelligence, 2(1- 2):108–121, 2020.
[HW09] M. A. Heroux and J. M. Willenbring. Barely sufficient software engineering: 10 practices to improve your cse software. In Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, pages 15–21, 2009.
[KRKP+16a] T. Kluyver, B. Ragan-Kelley, F. P´erez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, and C. Willing. Jupyter notebooks - a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87–90, 2016.
[KRKP+16b] T. Kluyver, B. Ragan-Kelley, F. P´erez, B. Granger, M. Bussonnier, J. Jonathan Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Dami´an Avila, S. Abdalla, and C. Willing. Jupyter notebooks – a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87 – 90. IOS Press, 2016.
[MaR21] MaRDI. Mathematic research data initiative, 2021. URL: https://www.mardi4nfdi.de.
[Nat22] National Academies of Sciences, Engineering, and Medicine. Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop. The National Academies Press, Washington, DC, 2022.
[PMBF17] JF. Pimentel, L. Murta, V. Braganholo, and J. Freire. noworkflow: a tool for collecting, analyzing, and managing provenance from python scripts. Proceedings of the VLDB Endowment, 10(12), 2017.
[PMBF21] JF. Pimentel, L. Murta, V. Braganholo, and J. Freire. Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering, 26(4):65, 2021.
[SM24] S. Samuel and D. Mietchen. Computational reproducibility of Jupyter notebooks from biomedical publications. GigaScience, 13:giad113, 01 2024.
[UHY+21] M. Uhrin, SP. Huber, J. Yu, N. Marzari, and G. Pizzi. Workflows in aiida: Engineering a high-throughput, event-based engine for robust and modular computational workflows. Computational Materials Science, 187:110086, 2021.
[VHB23] PL. Veluvali, J. Heiland, and P. Benner. Mardiflow: A workflow framework for documentation and integration of fair computational experiments. In Proceedings of the Conference on Research Data Infrastructure, volume 1, 2023.
[WDA+16] MD. Wilkinson, M. Dumontier, IJ. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, JW. Boiten, LB. da Silva Santos, PE. Bourne, et al. The FAIR guiding principles for scientific data management and stewardship. Scientific data, 3(1):1–9, 2016.
This paper is available on arxiv under CC BY 4.0 DEED license.