Model for the evaluation of metadata quality: Proposal for open science management in Cuba

Authors

DOI:

https://doi.org/10.47909/978-9916-9974-5-1.97

Keywords:

metadata quality, open science, disambiguation of author names, data integration, data cleansing

Abstract

The evaluation of metadata quality is of vital importance in the management of open science (OS) in Cuba. In the metadata used in open-access computational systems, unresolved quality problems such as incompleteness of records, ambiguous author names, null values, inconsistency in the use of data exchange formats, and the non-adoption of procedures for metadata quality management are detected. Therefore, this paper proposes a model for the evaluation of metadata quality associated with the management of OS in Cuba. This model is constituted by four stages. Stage 1 refers to the measurement of the identified quality dimensions. Stage 2 corresponds to data cleaning and standardization. Stage 3 corresponds to data integration. Stage 4 deals with data disambiguation based on open access criteria and standards. As a result, completeness at the record level and accuracy at the author name level were identified as the main dimensions of quality. Possible duplicate elements were detected for subsequent integration. A case study is presented with two variant solutions, one for grouping synonymous author names and the other for disambiguating synonymous and homonymous author names, thus, laying the foundations for the interoperability of computational systems.

Downloads

Download data is not yet available.

References

Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys, 41(3), 1–52. https://doi.org/10.1145/1541880.1541883

Batini, C., & Scannapieco, M. (2016). Data and information quality. Dimensions, principles and techniques. Springer International Publishing.

Bruce, T. R., & Hillmann, D. I. (2004). The continuum of metadata quality: Defining, expressing, exploiting. In D. I. Hillmann & E. L. Westbrooks (Eds.), Metadata in practice. American Library Association.

Díaz-de-la-Paz, L., Concepción-Pérez, L., Portal-Díaz, J. A., Taboada-Crispi, A., & Leiva-Mederos, A. A. (2022). Framework for author name disambiguation in scientific papers using an ontological approach and deep learning. . In B. Villazón-Terrazas, F. Ortiz-Rodriguez, S. Tiwari, M.-A. Sicilia, & D. Martín-Moncunill (Eds.), Communications in computer and information science (vol. 1686, pp. 216233). Springer International Publishing.

Díaz de la Paz, L., Riestra Collado, F. N., García Mendoza, J. L., González González, L. M., Leiva Mederos, A. A., & Taboada Crispi, A. (2021). Weights estimation in the completeness measurement of bibliographic metadata. Computación y Sistemas, 25(1), 117–128. https://doi.org/10.13053/cys-25-1-3355

FOSTER. (2018). Manual de Capacitación sobre Ciencia Abierta.

Goovaerts, M., Ciudad Ricardo, F. A., & Benitez Erice, D. (2016). Desarrollo de una red virtual de investigación y educación para la información científico en Cuba. In Congreso Internacional de Información Info’2016 (pp. 118).

López Porrero, B. (2011). Limpieza de datos: reemplazo de valores ausentes y estandarización. Universidad Central “Marta Abreu” de Las Villas.

Meneses-Placeres, G., Álvarez Reinaldo, L. A., & Machado Rivero, M. O. (2022). Revisión de las Prácticas de Ciencia Abierta en América Latina y el Caribe. Revista Cubana de Transformación Digital, 3(1).

Moges, H.-T., Dejaeger, K., Lemahieu, W., & Baesens, B. (2013). A multidimensional analysis of data quality for credit risk management: New insights and challenges. Information & Management, 50(1), 43–58. https://doi.org/10.1016/j.im.2012.10.001

Ochoa, X, & Duval, E. (2011). Learnometrics: metrics for learning objects. In Proceedings of the 1st international conference on learning analytics and knowledge (pp. 1–8). ACM.

Ochoa, X., & Duval, E. (2006). Quality metrics for learning object metadata. In Proceedings of World conference on educational multimedia, hypermedia and telecommunications (pp. 1004–1011). AACE.

Zhang, L., Lu, W., & Yang, J. (2021). LAGOS-AND: A large gold standard dataset for scholarly author name disambiguation. arXiv preprint arXiv:2104.01821, 1–27.

Downloads

Published

17-10-2024

How to Cite

Paz, L. D. de la, Crispí, A. T., & Mederos, A. A. L. (2024). Model for the evaluation of metadata quality: Proposal for open science management in Cuba. Advanced Notes in Information Science, 6, 100–113. https://doi.org/10.47909/978-9916-9974-5-1.97