Model for the evaluation of metadata quality: Proposal for open science management in Cuba
DOI:
https://doi.org/10.47909/978-9916-9974-5-1.97Keywords:
metadata quality, open science, disambiguation of author names, data integration, data cleansingAbstract
The evaluation of metadata quality is of vital importance in the management of open science (OS) in Cuba. In the metadata used in open-access computational systems, unresolved quality problems such as incompleteness of records, ambiguous author names, null values, inconsistency in the use of data exchange formats, and the non-adoption of procedures for metadata quality management are detected. Therefore, this paper proposes a model for the evaluation of metadata quality associated with the management of OS in Cuba. This model is constituted by four stages. Stage 1 refers to the measurement of the identified quality dimensions. Stage 2 corresponds to data cleaning and standardization. Stage 3 corresponds to data integration. Stage 4 deals with data disambiguation based on open access criteria and standards. As a result, completeness at the record level and accuracy at the author name level were identified as the main dimensions of quality. Possible duplicate elements were detected for subsequent integration. A case study is presented with two variant solutions, one for grouping synonymous author names and the other for disambiguating synonymous and homonymous author names, thus, laying the foundations for the interoperability of computational systems.
Downloads
References
Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1–52. https://doi.org/10.1145/1541880.1541883
Batini, C., & Scannapieco, M. (2016). Data and information quality. Dimensions, principles and techniques. Springer International Publishing.
Bruce, T. R., & Hillmann, D. I. (2004). The continuum of metadata quality: Defining, expressing, exploiting. In D. I. Hillmann & E. L. Westbrooks (Eds.), Metadata in practice. American Library Association.
Díaz-de-la-Paz, L., Concepción-Pérez, L., Portal-Díaz, J. A., Taboada-Crispi, A., & Leiva-Mederos, A. A. (2022). Framework for author name disambiguation in scientific papers using an ontological approach and deep learning. . In B. Villazón-Terrazas, F. Ortiz-Rodriguez, S. Tiwari, M.-A. Sicilia, & D. Martín-Moncunill (Eds.), Communications in computer and information science (vol. 1686, pp. 216233). Springer International Publishing.
Díaz de la Paz, L., Riestra Collado, F. N., García Mendoza, J. L., González González, L. M., Leiva Mederos, A. A., & Taboada Crispi, A. (2021). Weights estimation in the completeness measurement of bibliographic metadata. Computación y Sistemas, 25(1), 117–128. https://doi.org/10.13053/cys-25-1-3355
FOSTER. (2018). Manual de Capacitación sobre Ciencia Abierta.
Goovaerts, M., Ciudad Ricardo, F. A., & Benitez Erice, D. (2016). Desarrollo de una red virtual de investigación y educación para la información científico en Cuba. In Congreso Internacional de Información Info’2016 (pp. 118).
López Porrero, B. (2011). Limpieza de datos: reemplazo de valores ausentes y estandarización. Universidad Central “Marta Abreu” de Las Villas.
Meneses-Placeres, G., Álvarez Reinaldo, L. A., & Machado Rivero, M. O. (2022). Revisión de las Prácticas de Ciencia Abierta en América Latina y el Caribe. Revista Cubana de Transformación Digital, 3(1).
Moges, H.-T., Dejaeger, K., Lemahieu, W., & Baesens, B. (2013). A multidimensional analysis of data quality for credit risk management: New insights and challenges. Information & Management, 50(1), 43–58. https://doi.org/10.1016/j.im.2012.10.001
Ochoa, X, & Duval, E. (2011). Learnometrics: metrics for learning objects. In Proceedings of the 1st international conference on learning analytics and knowledge (pp. 1–8). ACM.
Ochoa, X., & Duval, E. (2006). Quality metrics for learning object metadata. In Proceedings of World conference on educational multimedia, hypermedia and telecommunications (pp. 1004–1011). AACE.
Zhang, L., Lu, W., & Yang, J. (2021). LAGOS-AND: A large gold standard dataset for scholarly author name disambiguation. arXiv preprint arXiv:2104.01821, 1–27.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Lisandra Díaz de la Paz, Alberto Taboada Crispí, Amed Abel Leiva Mederos

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) which permits copying and redistributing the material in any medium or format, adapting, transforming and building upon the material as long as the license terms are followed.