The Medical Segmentation Decathlon

Antonelli, Michela; Annika, Reinke; Spyridon, Bakas; Keyvan, Farahani; Annette, Kopp-Schneider; Landman Bennett, A.; Geert, Litjens; Bjoern, Menze; Olaf, Ronneberger; Summers Ronald, M.; Van Ginneken Bram,; Michel, Bilello; Patrick, Bilic; Christ Patrick, F.; Do Richard, K. G.; Gollub Marc, J.; Heckers Stephan, H.; Henkjan, Huisman; Jarnagin William, R.; Mchugo Maureen, K.; Sandy, Napel; Golia, Pernicka Jennifer S.; Kawal, Rhode; Catalina, Tobon-Gomez; Eugene, Vorontsov; Meakin James, A.; Sebastien, Ourselin; Manuel, Wiesenfarth; Pablo, Arbeláez; Byeonguk, Bae; Sihong, Chen; Laura, Daza; Jianjiang, Feng; Baochun, He; Fabian, Isensee; Yuanfeng, Ji; Fucang, Jia; Ildoo, Kim; Klaus, Maier-Hein; Dorit, Merhof; Akshay, Pai; Beomhee, Park; Mathias, Perslev; Ramin, Rezaiifar; Oliver, Rippel; Ignacio, Sarasua; Wei, Shen; Jaemin, Son; Christian, Wachinger; Liansheng, Wang; Yan, Wang; Yingda, Xia; Daguang, Xu; Zhanwei, Xu; Yefeng, Zheng; Simpson Amber, L.; Lena, Maier-Hein; Jorge, Cardoso M.

doi:10.1038/s41467-022-30695-9

International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Although segmentation is the most widely investigated medical image processing task, the various challenges have been organized to focus only on specific clinical tasks. We organized the Medical Segmentation Decathlon (MSD)—a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities to investigate the hypothesis that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a customdesigned solution. MSD results confirmed this hypothesis, moreover, MSD winner continued generalizing well to a wide range of other clinical problems for the next two years. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to scientists that are not versed in AI model training.