A methodology for developing dermatological datasets: lessons from retrospective data collection for AI-based applications
Revista : BMC Medical Research MethodologyVolumen : 25
Número : 1
Tipo de publicación : ISI Ir a publicación
Abstract
PurposeThe integration of artificial intelligence into dermatological research has underscored the need for robust and well-structured dermatological datasets. However, these datasets vary widely in their development processes, and there is currently no standard methodology to create such datasets. This work identifies three pressing needs for the building of dermatological datasets focus on skin tumor classification: the need for multimodal datasets, the definition of minimum metadata requirements, and the inclusion of underrepresented populations to address the scarcity of health data.MethodsWe propose a practical methodology to create dermatological datasets from clinical records, incorporating both images and patient metadata. The process consists of four key stages: getting the institutional review board approval and analysis of clinical information sources, data recording and structuring, processing of clinical data and images, and quality assessment. This methodology was derived from hands-on experience in building two datasets from Chilean and Mexican populations, respectively.ResultsThe methodology allows the creation of well-structured datasets by simplifying data organization and enabling replication. Each step includes practical guidance for dealing with typical challenges, such as image metadata categorization and technical validation by dermatologists and computer scientists.ConclusionOur contribution offers a reproducible, scalable, and interdisciplinary framework for creating dermatological datasets, especially useful for countries initiating dataset creation. In addition to the methodological proposal, we highlight common pitfalls and offer recommendations to mitigate them.

English