AI bias on black skin diseases – Fitzpatrick 6 skin types
Artificial Intelligence (AI) Research
Why datasets used in training Dermatological Artificial Intelligence applications need to be diverse across skin types and geographies
Artificial Intelligence (AI) in healthcare has gained momentum with implementations cutting across spectra of improving disease diagnosis e.g for cancers and dermatology, radiology images reading support, symptom checkers, biopharmaceuticals among others. The value of AI in healthcare is immense with advantages in decreasing cost of accessing medical care, increasing diagnostic accuracy as well as being a better tool to promote disease prevention. Studies have already produced evidence of such with chatbots shown to have better outcomes when used in managing mental health conditions.
AI bias on black skin – Fitzpatrick 6 skin types – people
However, the big dilemma that always surrounds the applicability of these AI supported tools in African settings as often the development and training of computer algorithms happens in Western countries continue to be a big barrier to their adoption in Africa. A number of reports have echoed the non inclusiveness of AI tools with a strong bias to caucasian high performance compared to other geographies. For example a blog post by Florian Dietz, highlights examples of AI bias on black skin people. It was on that background, we carried out a study to assess the performance of an AI algorithm developed by First Derm using a deep convolutional neural network (CNN) called “Skin Image Search” on Fitzpatrick 6 skin types (black-dark) dermatological conditions.
We undertook to analyze 123 dermatological skin images that were retrospectively extracted from the electronic database of a Ugandan telehealth company, The Medical Concierge Group (TMCG) upon seeking consent. Predictability levels of the AI app was graded on a scale of 0 to 5, where 0- no prediction made and 1-5 demonstrating reducing correct prediction along skin disease condition and body part. The article was published in BioRxiv
Dermatological images were uploaded for automated classification using an online version of the AI app which required uploading two images, one showing the wider body area where the lesion is situated and second close-up photo to allow the classification process to be made as illustrated in figure 1 below. The diagnosis generated by the AI was bench-marked against the clinician’s where each individual image was reviewed by 3 independent general practitioners for which the final clinical diagnosis was the most reported diagnosis. In instances where there was disagreement, the final diagnosis was achieved by consensus.
Figure 1: Illustration on how the Skin search Application was used.
The top 5 differential diagnoses returned from the AI app against each individual image that was tested were imported into Microsoft Excel sheet. Matching the image’s clinical diagnosis with the returned top 5, each classification was given a score from 1-5 depending on which position the confirmed diagnosis had, score 1 being the most likely. If the confirmed diagnosis was absent in the top 5, the classification was given a score of 0.
The overall diagnostic accuracy of the AI app was analyzed as well as the diagnostic accuracy for separate diagnosis, diagnostic groups with a common etiology (e.g. tumors, viral diseases and fungal diseases) or body site (e.g. genital diseases and facial diseases). Overall diagnostic accuracy of the AI app was low at 17% (i.e. 21 out of 123 predictable images) with varying predictability levels correctness i.e. 1-8.9%, 2- 2.4%, 3-2.4%, 4-1.6%, 5-1.6% with performance along individual diagnosis highest for dermatitis skin conditions (80%) and lowest (0%) for fungal skin condition yet these had the highest count. Out of the 123 images uploaded; the AI app returned a diagnosis in 62% of all body parts (8/13). The AI app performed well in dermatological images from the face, trunk and genital areas and lowest for lower limb images.
These findings correlates with earlier reference reports on AI bias to certain geographies for example use in voice recognition where it was noted that African voices and accents were being excluded or not being detected. Furthermore, similar observations have been made on facial recognition algorithms which struggle to recognize black faces. However, that AI app being able to return a diagnosis in 62% of all body parts (8/13) is an indicator that the AI app had been trained with images from a variety of different body sites.
In conclusion, AI has a place in diagnostic support for dermatological conditions, however a need for diversity in the images used when training CNNs will help reduce biases and help achieve significantly better diagnostic accuracy results.
The team that contributed to this work was multidisciplinary and comprised software engineers, researchers and clinicians. Key contributors to this project include; Louis Henry Kamulegeya, Mark Okello, Davis Musinguzi, John Mark Bwanika, William Lubega, Faith Nassiwa, Davis Rusoke and Alexander Börve. Special thanks to the clinicians at The Medical Concierge Group (TMCG) who reviewed cases for this study and the First Derm team for allowing us do beta-testing on the skin image search applications in the context of black dermatological images.
Ask a Dermatologist
Anonymous, fast and secure!
Medical Doctor and PhD candidate