Machine learning techniques as an eﬃcient alternative diagnostic tool for COVID-19 cases
1School of Industrial Engineering, Pontiﬁcia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile
2Academic Unit, Clínica MEDS, 7691236 Santiago, Chile
3Health Sciences Ph.D. Program, Universidad Católica de Murcia, 30107 Murcia, Spain
4Departments of Radiology, Clínica MEDS, 7691236 Santiago, Chile
Submitted: 07 April 2021 Accepted: 14 May 2021
Online publish date: 24 June 2021
Background: The SARS-CoV-2 virus has demonstrated the weakness of many health systems worldwide, creating a saturation and lack of access to treatments. A bottleneck to fight this pandemic relates to the lack of diagnostic infrastructure for early detection of positive cases, particularly in rural and impoverished areas of developing countries. In this context, less costly and fast machine learning (ML) diagnosis-based systems are helpful. However, most of the research has focused on deep-learning techniques for diagnosis, which are computationally and technologically expensive. ML models have been mainly used as a benchmark and are not entirely explored in the existing literature on the topic of this paper.
Objective: To analyze the capabilities of ML techniques (compared to deep learning) to diagnose COVID-19 cases based on X-ray images, assessing the performance of these techniques and using their predictive power for such a diagnosis.
Methods: A factorial experiment was designed to establish this power with X-ray chest images of healthy, pneumonia, and COVID-19 infected patients. This design considers data-balancing methods, feature extraction approaches, different algorithms, and hyper-parameter optimization. The ML techniques were evaluated based on classification metrics, including accuracy, the area under the receiver operating characteristic curve (AUROC), F1-score, sensitivity, and specificity.
Results: The design of experiment provided the mean and its confidence intervals for the predictive capability of different ML techniques, which reached AUROC values as high as 90% with suitable sensitivity and specificity. Among the learning algorithms, support vector machines and random forest performed best. The down-sampling method for unbalanced data improved the predictive power significantly for the images used in this study.
Conclusions: Our investigation demonstrated that ML techniques are able to identify COVID-19 infected patients. The results provided suitable values of sensitivity and specificity, minimizing the false-positive or false-negative rates. The models were trained with significantly low computational resources, which helps to provide access and deployment in rural and impoverished areas.
Artificial intelligence; Deep learning; PCR; ROC curve; R software; SARS-CoV-2; X-rays
Nicolás Bustos,Manuel Tello,Guillermo Droppelmann,Nicolás García,Felipe Feijoo,Víctor Leiva. Machine learning techniques as an eﬃcient alternative diagnostic tool for COVID-19 cases. Signa Vitae. 2021.doi:10.22514/sv.2021.110.
 Sun P, Lu X, Xu C, Sun W, Pan B. Understanding of COVID-19 based on current evidence. Journal of Medical Virology. 2020; 92: 548–551.
 Velavan TP, Meyer CG. The COVID-19 epidemic. Tropical Medicine & International Health. 2020; 25: 278–280.
 Chen Y, Liu Q, Guo D. Emerging coronaviruses: Genome structure, replication, and pathogenesis. Journal of Medical Virology. 2020; 92: 418–423.
 Sohrabi C, Alsafi Z, O’Neill N, Khan M, Kerwan A, Al-Jabir A, et al. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). International Journal of Surgery. 2020; 76: 71–76.
 Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A novel coronavirus from patients with pneumonia in China. The New England Journal of Medicine. 2020; 382: 727–733.
 Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2020; 395: 497–506.
 Pan L, Mu M, Yang P, Sun Y, Wang R, Yan J, et al. Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: A descriptive, cross-sectional, multicenter study. American Journal of Gastroenterology. 2020; 115: 766–773.
 Ren LL, Wang YM, Wu ZQ, Xiang ZC, Guo L, Xu T, et al. Identification of a novel coronavirus causing severe pneumonia in human: a descriptive study. Chinese Medical Journal. 2020; 133: 1015–1024.
 Rothan HA, Byrareddy SN. The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. Journal of Autoimmunity. 2020; 109: e102433.
 Wang W, Tang J, Wei F. Updated understanding of the outbreak of 2019 novel coronavirus in Wuhan, China. Journal of Medical Virology. 2020; 92: 441–447.
 Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. New England Journal of Medicine. 2020; 382: 1199–1207.
 Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, et al. Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology. 2020; 296: E115–E117.
 Al-Tawfiq JA, Memish ZA. Diagnosis of SARS-CoV-2 infection based on CT scan vs RT-PCR: reflecting on experience from MERS-CoV. Journal of Hospital Infection. 2020; 105: 154–155.
 Wang Y, Kang H, Liu X, Tong Z. Combination of RT‐qPCR testing and clinical features for diagnosis of COVID‐19 facilitates management of SARS‐CoV‐2 outbreak. Journal of Medical Virology. 2020; 92: 538–539.
 Zou L, Ruan F, Huang M, Liang L, Huang H, Hong Z, et al. SARS-CoV-2 Viral load in upper respiratory specimens of infected patients. New England Journal of Medicine. 2020; 382: 1177–1179.
 Dai W, Zhang H, Yu J, Xu H, Chen H, Luo S, et al. CT Imaging and differential diagnosis of COVID-19. Canadian Association of Radiologists Journal. 2020; 71: 195–200.
 Garg M, Prabhakar N, Gulati A, Agarwal R, Dhooria S. Spectrum of imaging findings in pulmonary infections. Part 1: bacterial and viral. Polish Journal of Radiology. 2019; 84: e205-e213.
 El Asnaoui K, Chawki Y. Using X-ray images and deep learning for automated detection of coronavirus disease. Journal of Biomolecular Structure and Dynamics. 2020: 1–12.
 Ye Z, Zhang Y, Wang Y, Huang Z, Song B. Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review. European Radiology. 2020; 30: 4381–4389.
 Bernheim A, Mei X, Huang M, Yang Y, Fayad ZA, Zhang N, et al. Chest CT Findings in coronavirus disease-19 (COVID-19): relationship to duration of infection. Radiology. 2020; 295: e200463.
 Ng M, Lee EYP, Yang J, Yang F, Li X, Wang H, et al. Imaging Profile of the COVID-19 infection: radiologic findings and literature review. Radiology: Cardiothoracic Imaging. 2020; 2: e200034.
 Feijoo F, Palopoli M, Bernstein J, Siddiqui S, Albright TE. Key indicators of phase transition for clinical trials through machine learning. Drug Discovery Today. 2020; 25: 414–421.
 Vaishya R, Javaid M, Khan IH, Haleem A. Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2020; 14: 337–339.
 Li L, Qin L, Xu Z, Wang X, Yin Y, Kong B, et al. Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology. 2020; 296: E65–E71.
 Özkaya U, Öztürk Ş, Barstugan M. Coronavirus (COVID-19) classifi-cation using deep features fusion and ranking technique. In Big Data Analytics and Artificial Intelligence Against COVID-19: Innovation Vision and Approach. Springer, Cham. 2020; pp. 281–295.
 Zhang K, Liu X, Shen J, Li Z, Sang Y, Wu X, et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell. 2020; 181: 1423–1433.
 Polat C, Karaman O, Karaman C, Korkmaz G, Balcı MC, Kelek SE. COVID-19 diagnosis from chest X-ray images using transfer learning: Enhanced performance by debiasing dataloader. Journal of X-Ray Science and Technology. 2021; 29: 19–36.
 Özkaya U, Öztürk S, Budak S, Melgani F, Polat K. Classification of COVID-19 in chest CT images using convolutional support vector machines. arXiv. 2021. (in press)
 Sethy PK, Behera SK, Ratha PK, Biswas P. Detection of coronavirus disease (COVID-19) based on deep features and support vector machine. Preprint. 2021. (in press)
 Öztürk S, Özkaya U, Barstugan M. Classification of coronavirus (COVID-19) from X-ray and CT images using shrunken features. International Journal of Imaging Systems and Technology. 2021; 31: 5–15.
 Maia M, Pimentel JS, Pereira IS, Gondim J, Barreto ME, Ara A. Convolutional support vector models: prediction of coronavirus disease using chest X-rays. Information. 2020; 11: 548.
 Palacios CA, Reyes-Suarez JA, Bearzotti LA, Leiva V, Marchant C. Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy. 2021; 23: 485.
 Park J, Lee J, Sim D. Low-complexity CNN with 1D and 2D filters for super-resolution. Journal of Real-Time Image Processing. 2020; 17: 2065–2076.
 Valdivia M. Public health infrastructure and equity in the utilization of outpatient health care services in Peru. Health Policy and Planning. 2002; 17: 12–19.
 Lan K, Wang D, Fong S, Liu L, Wong KKL, Dey N. A Survey of Data Mining and Deep Learning in Bioinformatics. Journal of Medical Systems. 2018; 42: 139.
 Gray DM, Anyane-Yeboa A, Balzora S, Issaka RB, May FP. COVID-19 and the other pandemic: populations made vulnerable by systemic inequity. Nature Reviews Gastroenterology and Hepatology. 2020; 17: 520–522.
 McMahon DE, Peters GA, Ivers LC, Freeman EE. Global resource shortages during COVID-19: bad news for low-income countries. PLoS Neglected Tropical Diseases. 2020; 14: e0008412.
 Martinez DA, Hinson JS, Klein EY, Irvin NA, Saheed M, Page KR, et al. SARS-CoV-2 Positivity rate for latinos in the Baltimore–Washington, DC Region. The Journal of the American Medical Association. 2020; 324: 392–395.
 Thakur N, Lovinsky-Desir S, Bime C, Wisnivesky JP, Celedón JC. The structural and social determinants of the racial/ethnic disparities in the US COVID-19 pandemic. what’s our role? American Journal of Respiratory and Critical Care Medicine. 2020; 202: 943–949.
 Banerjee P, Dehnbostel FO, Preissner R. Prediction is a balancing act: importance of sampling methods to balance sensitivity and specificity of predictive models based on imbalanced chemical data sets. Frontiers in Chemistry. 2018; 6: 362.
 Sommer C, Gerlich DW. Machine learning in cell biology-teaching computers to recognize phenotypes. Journal of Cell Science. 2013; 126: 5529–5539.
 Sun Y, Li L, Zheng L, Hu J, Li W, Jiang Y, et al. Image classification base on PCA of multi-view deep representation. Journal of Visual Communication and Image Representation. 2019; 62: 253–258.
 Haralick RM. Statistical and structural approaches to texture. Proceedings of the IEEE. 1979; 67: 786–804.
 Chen YQ, Nixon MS, Thomas DW. Statistical geometrical features for texture classification. Pattern Recognition. 1995; 28: 537–552.
 Chebira A, Barbotin Y, Jackson C, Merryman T, Srinivasa G, Murphy RF, et al. A multiresolution approach to automated classification of protein subcellular location images. BMC Bioinformatics. 2007; 8: 210.
 Liu S, Mundra PA, Rajapakse JC. Features for cells and nuclei classification. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, Boston, MA, USA. 2011.
 Pau G, Fuchs F, Sklyar O, Boutros M, Huber W. EBImage—an R package for image processing with applications to cellular phenotypes. Bioinformatics. 2010; 26: 979–981.
 R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 2020. Available at: https://www.R-project.org/(Accessed: May 20 2020).
 Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2017.
 Cohen JP, Morrison P, Dao L, Roth K, Duong TQ, Ghassemi M. COVID-19 image data collection: prospective predictions are the future. ArXiv. 2020. (in press)
 Chawla N, Bowyer K, Hall L, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2020; 16: 321–357.
 Kucirka LM, Lauer SA, Laeyendecker O, Boon D, Lessler J. Variation in false-negative rate of reverse transcriptase polymerase chain reaction–based SARS-CoV-2 tests by time since exposure. Annals of Internal Medicine. 2020; 173: 262–267.
 Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, Zambrano-Achig P, Del Campo R, Ciapponi A, et al. False-negative results of initial TR-PCR assays for COVID-19: a systematic review. PLoS ONE. 2020; 15: e0242958.
 Bachelet VC. Do we know the diagnostic properties of the tests used in COVID-19? A rapid review of recently published literature. Medwave. 2020; 20: e7890.
 Watson J, Whiting PF, Brush JE. Interpreting a COVID-19 test result. British Medical Association. 2020; 369.
 Aykroyd RG, Leiva V, Ruggeri F. Recent developments of control charts, identification of big data sources and future trends of current research. Technological Forecasting and Social Change. 2019; 144: 221–232.
 Mesko B, Gyorffy Z. The rise of the empowered physician in the digital health era. Journal of Medical Internet Research. 2019; 21: e12490.
Science Citation Index Expanded (SciSearch) Created as SCI in 1964, Science Citation Index Expanded now indexes over 9,200 of the world’s most impactful journals across 178 scientific disciplines. More than 53 million records and 1.18 billion cited references date back from 1900 to present.
Journal Citation Reports/Science Edition Journal Citation Reports/Science Edition aims to evaluate a journal’s value from multiple perspectives including the journal impact factor, descriptive data about a journal’s open access content as well as contributing authors, and provide readers a transparent and publisher-neutral data & statistics information about the journal.
Chemical Abstracts Service Source Index The CAS Source Index (CASSI) Search Tool is an online resource that can quickly identify or confirm journal titles and abbreviations for publications indexed by CAS since 1907, including serial and non-serial scientific and technical publications.
IndexCopernicus The Index Copernicus International (ICI) Journals database’s is an international indexation database of scientific journals. It covered international scientific journals which divided into general information, contents of individual issues, detailed bibliography (references) sections for every publication, as well as full texts of publications in the form of attached files (optional). For now, there are more than 58,000 scientific journals registered at ICI.
Geneva Foundation for Medical Education and Research The Geneva Foundation for Medical Education and Research (GFMER) is a non-profit organization established in 2002 and it works in close collaboration with the World Health Organization (WHO). The overall objectives of the Foundation are to promote and develop health education and research programs.
Scopus: CiteScore 0.5(2019) Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 Inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-level subject fields: life sciences, social sciences, physical sciences and health sciences.
Embase Embase (often styled EMBASE for Excerpta Medica dataBASE), produced by Elsevier, is a biomedical and pharmacological database of published literature designed to support information managers and pharmacovigilance in complying with the regulatory requirements of a licensed drug.