Article Data

  • Views 2874
  • Dowloads 239

Original Research

Open Access Special Issue

Data-driven tools for assessing and combating COVID-19 outbreaks in Brazil based on analytics and statistical methods

  • Raydonal Ospina1
  • André Leite1
  • Cristiano Ferraz1
  • André Magalhães2
  • Víctor Leiva3,*,

1Department of Statistics, CASTLab, Universidade Federal de Pernambuco, 51280-000 Recife, Brazil

2Department of Economics, Universidade Federal de Pernambuco, 51280-000 Recife, Brazil

3School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, 2340000 Valparaíso, Chile

DOI: 10.22514/sv.2021.253 Vol.18,Issue 3,May 2022 pp.18-32

Submitted: 18 August 2021 Accepted: 03 November 2021

Published: 08 May 2022

*Corresponding Author(s): Víctor Leiva E-mail:;


The COVID-19 pandemic is one of the worst public health crises in Brazil and the world that has ever been faced. One of the main challenges that the healthcare systems have when decision-making is that the protocols tested in other epidemics do not guarantee success in controlling the spread of COVID-19, given its complexity. In this context, an effective response to guide the competent authorities in adopting public policies to fight COVID-19 depends on thoughtful analysis and effective data visualization, ideally based on different data sources. In this paper, we discuss and provide tools that can be helpful using data analytics to respond to the COVID-19 outbreak in Recife, Brazil. We use exploratory data analysis and inferential study to determine the trend changes in COVID-19 cases and their effective or instantaneous reproduction numbers. According to the data obtained of confirmed COVID-19 cases disaggregated at a regional level in this zone, we note a heterogeneous spread in most megaregions in Recife, Brazil. When incorporating quarantines decreed, effectiveness is detected in the regions. Our results indicate that the measures have effectively curbed the spread of the disease in Recife, Brazil. However, other factors can cause the effective reproduction number to not be within the expected ranges, which must be further studied.


Basic and effective reproduction numbers; Data science; Data visualization; Growth model; SARS-CoV-2; Smart analytics; Time-series models

Cite and Share

Raydonal Ospina,André Leite,Cristiano Ferraz,André Magalhães,Víctor Leiva. Data-driven tools for assessing and combating COVID-19 outbreaks in Brazil based on analytics and statistical methods. Signa Vitae. 2022. 18(3);18-32.


[1] Johns Hopkins, C. S. S. E. Coronavirus COVID-19 global cases by the center for systems science and engineering (CSSE) at Johns Hopkins University (JHU). 2020. Available at: https://coronavirus.jhu. edu/map.html (Accessed: 5 August 2021).

[2] Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2020; 395: 497–506.

[3] Martin-Barreiro C, Ramirez-Figueroa JA, Cabezas X, Leiva V, Galindo-Villardón MP. Disjoint and functional principal component analysis for infected cases and deaths due to COVID-19 in South American countries with sensor-related data. Sensors. 2021; 21: 4094.

[4] Brum AA, Duarte-Filho GC, Vasconcelos GL. Application Modin-terv COVID-19. 2020. Available at: (Accessed: 5 August 2021).

[5] Duarte-Filho GC, Brum AA, Ospina R, Almeida FA, Macêdo AM, Vasconcelos GL. Recife e Belém são atualmente as únicas capitais que já estão na fase de saturação da Covid-19 no Brasil. Scielo Preprints. (in press)

[6] Jerez-Lillo N, Álvarez BL, Gutiérrez JM, Figueroa-Zúñiga JI, Leiva V. A statistical analysis for the epidemiological surveillance of COVID-19 in Chile. Signa Vitae. 2021. (in press)

[7] Liu Y, Mao C, Leiva V, Liu S, Silva Neto WA. Asymmetric autoregressive models: Statistical aspects and a financial application under COVID-19 pandemic. Journal of Applied Statistics. 2021. (in press)

[8] Chahuán-Jiménez K, Rubilar R, de la Fuente-Mella H, Leiva V. Breakpoint analysis for the COVID-19 pandemic and its effect on the stock markets. Entropy. 2021; 23: 100.

[9] de la Fuente-Mella H, Rubilar R, Chahuán-Jiménez K, Leiva V. Modeling COVID-19 cases statistically and evaluating their effect on the economy of the countries. Mathematics. 2021; 9: 1558.

[10] Cabezas X, García S, Martin-Barreiro C, Delgado E, Leiva V. A two-stage location problem with order solved using a Lagrangian algorithm and stochastic programming for a potential use in COVID-19 vaccination based on sensor-related data. Sensors. 2021; 21: 5352.

[11] Rojas F, Leiva V, Huerta M, Martin-Barreiro C. Lot-size models with uncertain demand considering its skewness/kurtosis and stochastic programming applied to hospital pharmacy with sensor-related COVID-19 data. Sensors. 2021; 21: 5198.

[12] Cauchemez S, Fraser C, Van Kerkhove MD, Donnelly CA, Riley S, Rambaut A, et al. Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility. The Lancet Infectious Diseases. 2014; 14: 50–56.

[13] Cori A, Donnelly CA, Dorigatti I, Ferguson NM, Fraser C, Garske T, et al. Key data for outbreak evaluation: building on the Ebola experience. Philosophical Transactions of the Royal Society B. 2017; 372: 20160371.

[14] Yozwiak NL, Schaffner SF, Sabeti PC. Data sharing: Make outbreak research open access. Nature. 2015; 518: 477–479.

[15] Ienca M, Vayena E. On the responsible use of digital data to tackle the COVID-19 pandemic. Nature Medicine. 2020; 26: 463–464.

[16] Moorthy V, Henao Restrepo AM, Preziosi MP, Swaminathan S. Data sharing for novel coronavirus (COVID-19). Bulletin of the World Health Organization. 2020; 98: 150.

[17] Drew DA, Nguyen LH, Steves CJ, Menni C, Freydin M, Varsavsky T, et al. Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science. 2020; 368: 1362–1367.

[18] Zhuang Z, Zhao S, Lin Q, Cao P, Lou Y, Yang L, et al. Preliminary estimation of the novel coronavirus disease (COVID-19) cases in Iran: a modelling analysis based on overseas cases and air travel data. International Journal of Infectious Diseases. 2020; 94: 29–31.

[19] Plohl N, Musil B. Modeling compliance with COVID-19 prevention guidelines: the critical role of trust in science. Psychology, Health and Medicine. 2020; 26: 1–12.

[20] Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases. 2020; 20: 533–534.

[21] Dey SK, Rahman MM, Siddiqi UR, Howlader A. Analyzing the epidemiological outbreak of COVID-19: A visual exploratory data analysis approach. Journal of Medical Virology. 2020; 92: 632–638.

[22] Hamzah FB, Lau C, Nazri H, Ligot DV, Lee G, Tan CL, et al. CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction. Bull World Health Organ. 2020; 1.

[23] Cheshmehzangi A. How Cities Cope in Outbreak Events? The City in Need (pp. 17–39). Springer: Singapore. 2020.

[24] Fronterre C, Read JM, Rowlingson B, Bridgen J, Alderton S, Diggle PJ, et al. COVID-19 in England: spatial patterns and regional outbreaks. medRxiv. 2020. (in press)

[25] Nagraj VP, Randhawa N, Campbell F, Crellen T, Sudre B, Jombart T. epicontacts: Handling, visualisation and analysis of epidemiological contacts. F1000Research. 2018; 7: 566.

[26] Liu C, Wu X, Niu R, Wu X, Fan R. A new SAIR model on complex networks for analysing the 2019 novel coronavirus (COVID-19). Nonlinear Dynamics. 2020; 101: 1777–1787.

[27] Block P, Hoffman M, Raabe IJ, Dowd JB, Rahal C, Kashyap R, et al. Social network-based distancing strategies to flatten the COVID-19 curve in a post-lockdown world. Nature Human Behaviour. 2020; 4: 588–596.

[28] Parag KV, Donnelly CA. Using information theory to optimise epidemic models for real-time prediction and estimation. PLOS Computational Biology. 2020; 16: e1007990.

[29] Chowell G, Luo R, Sun K, Roosa K, Tariq A, Viboud C. Real-time forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics. 2020; 30: 100379.

[30] Liang K. Mathematical model of infection kinetics and its analysis for COVID-19, SARS and MERS. Infection, Genetics and Evolution. 2020; 82: 104306.

[31] Van den Broeck W, Gioannini C, Gonçalves B, Quaggiotto M, Colizza V, Vespignani A. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infectious Diseases. 2011; 11: 37.

[32] Vasconcelos GL, Macêdo AMS, Ospina R, Almeida FAG, Duarte-Filho GC, Brum AA, et al. Modelling fatality curves of COVID-19 and the effectiveness of intervention strategies. PeerJ. 2020; 8: e9421.

[33] Moore S, Rogers T. Predicting the Speed of Epidemics Spreading in Networks. Physical Review Letters. 2020; 124: 068301.

[34] Chowell G. Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts. Infectious Disease Modelling. 2017; 2: 379–398.

[35] Heymann DL, Shindo N. COVID-19: what is next for public health? The Lancet. 2020; 395: 542–545.

[36] Sun J, He WT, Wang L, Lai A, Ji X, Zhai X, et al. COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives. Trends in Molecular Medicine. 2020; 26: 483–495.

[37] Birrell PJ, De Angelis D, Presanis AM. Evidence Synthesis for Stochastic Epidemic Models. Statistical Science. 2018; 33: 34–43.

[38] Kucharski AJ, Russell TW, Diamond C, Liu Y, Edmunds J, Funk S, et al. Early dynamics of transmission and control of COVID-19: A mathematical modelling study. The Lancet Infectious Diseases. 2020; 20: 553–558.

[39] Alahmadi A, Belet S, Black A, Cromer D, Flegg JA, House T, et al. Influencing public health policy with data-informed mathematical models of infectious diseases: Recent developments and new challenges. Epidemics. 2020; 32: 100393.

[40] Ferraz C, Petenate AJ, Wanderley AL, Ospina R, Torres J, Peruzzi-Moreira A. COVID-19: Monitoring by Shewhart charts. Revista Brasileira de Estatística. 2020; 78: 23–41. (In Portuguese)

[41] Perla RJ, Provost SM, Parry GJ, Little K, Provost LP. Understanding variation in COVID-19 reported deaths with a novel Shewhart charts application. International Journal for Quality in Health Care. 2021; 33: mzaa069.

[42] Bustos N, Tello M, Droppelmann G, Garcia N, Feijoo F, Leiva V. Machine learning techniques as an efficient alternative diagnostic tool for COVID-19 cases. Signa Vitae. 2022; 18: 23-33.

[43] Ramirez-Figueroa JA, Martin-Barreiro C, Nieto-Librero AB, Leiva V, Galindo-Villardón MP. A new principal component analysis by particle swarm optimization with an environmental application for data science. Stochastic Environmental Research and Risk Assessment. 2021; 35: 1969–1984.

[44] Pita R, Pinto C, Sena S, Fiaccone R, Amorim L, Reis S, et al. On the Accuracy and Scalability of Probabilistic Data Linkage over the Brazilian 114 Million Cohort. IEEE Journal of Biomedical and Health Informatics. 2018; 22: 346–353.

[45] Duboue P. The Art of Feature Engineering: Essentials for Machine Learning. Cambridge University Press: Cambridge. 2020.

[46] Khashan EA, Eldesouky AI, Fadel M, Elghamrawy SM. A big data-based framework for executing complex query over COVID-19 datasets (COVID-QF). arXiv. 2020. (in press)

[47] Jombart T, Kamvar ZN, FitzJohn R, Cai J, Bhatia S, Schumacher J, et al. Incidence: Compute, Handle, Plot and Model Incidence of Dated Events. R package version. 2020; 1.

[48] Matrajt L, Leung T. Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of Coronavirus disease. Emerging Infectious Diseases. 2020; 26: 1740–1748.

[49] Vasconcelos GL, Macêdo AM, Duarte-Filho GC, Araújo AA, Ospina R, Almeida FA. Complexity signatures in the COVID-19 epidemic: Power law behaviour in the saturation regime of fatality curves. medRxiv. 2020.(in press)

[50] Wu K, Darcet D, Wang Q, Sornette D. Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in China and in the rest of the world. medRxiv. 2020. (in press)

[51] Velasco H, Laniado H, Toro M, Catano-Lopez A, Leiva V, Lio Y. Modeling the risk of infectious diseases transmitted by Aedes aegypti using survival and aging statistical analysis with a case study in Colombia. Mathematics. 2021; 9: 1488.

[52] Svensson A. A note on generation times in epidemic models. Mathemat-ical Biosciences. 2007; 208: 300–311.

[53] Wallinga J, Teunis P. Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology. 2004; 160: 509–516.

[54] Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B. 2006; 274: 599–604.

[55] Ali ST, Kadi AS, Ferguson NM. Transmission dynamics of the 2009 influenza a (H1N1) pandemic in India: the impact of holiday-related school closure. Epidemics. 2013; 5: 157–163.

[56] Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE. 2007; 2: e758.

[57] an der Heiden M, Hamouda O. Schätzung der aktuel-len entwicklung der Sars-Cov-2-epidemie in Deutschland-Nowcasting. Epidemiologisches Bulletin. 2020; 17: 10–15.

[58] Fierro R, Leiva V, Balakrishnan N. Statistical Inference on a Stochastic Epidemic Model. Communications in Statistics. 2015; 44: 2297–2314.

[59] Diekmann O, Heesterbeek JA, Metz JA. On the definition and the computation of the basic reproduction ratio Ro in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology. 1990; 28: 365–382.

[60] Bürger R, Chowell G, Lara-Díıaz LY. Comparative analysis of phe-nomenological growth models applied to epidemic outbreaks. Mathemat-ical Biosciences and Engineering. 2019; 16: 4250–4273.

[61] Khan U, Mehta R, Arif MA, Lakhani OJ. Pandemics of the past: A Narrative Review. The Journal of the Pakistan Medical Association. 2020; 70: S34–S37.

[62] Aykroyd RG, Leiva V, Ruggeri F. Recent developments of control charts, identification of big data sources and future trends of current research. Technological Forecasting and Social Change. 2019; 144: 221–232.

[63] Gupta A, Katarya R. Social media based surveillance systems for healthcare using machine learning: a systematic review. Journal of Biomedical Informatics. 2020; 108: 103500.

[64] Lanza F, Seidita V, Chella A. Agents and robots for collaborating and supporting physicians in healthcare scenarios. Journal of Biomedical Informatics. 2020; 108: 103483.

[65] Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. British Medical Journal. 2020; 368: l6927.

[66] Badie-Modiri A, Karsai M, Kivelä M. Efficient limited-time reachability estimation in temporal networks. Physical Review E. 2020; 101: 052303.

[67] Topirceanu A, Udrescu M, Marculescu R. Centralized and decentralized isolation strategies and their impact on the COVID-19 pandemic dynamics. arXiv. 2020. (in press)

[68] Alamo T, Reina DG, Millán P. Data-driven methods to monitor, model, forecast and control COVID-19 pandemic: Leveraging data science, epidemiology, and control theory. arXiv. 2020. (in press)

[69] Nikolaou P, Dimitriou L. Identification of critical airports for controlling global infectious disease outbreaks: Stress-tests focusing in Europe. Journal of Air Transport Management. 2020; 85: 101819.

[70] Ruiz-Estrada MA, Koutronas E. The application of the 2019-nCoV global economic impact simulator (the 2019-nCoV-GEI-Simulator) in China. Social Science Research Network. 2020. (in press)

[71] Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, et al. A deep learning algorithm using CT images to screen for Coronavirus disease (COVID-19). European Radiology. 2021; 31: 6096–6104.

[72] Pereira RM, Bertolini D, Teixeira LO, Silla CN Jr, Costa YMG. COVID-19 identification in chest X-ray images on flat and hierarchical classifica-tion scenarios. Computer Methods and Programs in Biomedicine. 2020; 194: 105532.

[73] Son H, Hyun C, Phan D, Hwang HJ. Data analytic approach for bankruptcy prediction. Expert Systems with Applications. 2019; 138: 112816.

[74] Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, R Niakan Kalhori S. Predicting COVID-19 incidence through analysis of google trends data in Iran: data mining and deep learning pilot study. JMIR Public Health and Surveillance. 2020; 6: e18828.

[75] Jahanbin K, Rahmanian V. Using twitter and web news mining to predict COVID-19 outbreak. Asian Pacific Journal of Tropical Medicine. 2020; 13: 378–380.

[76] Bukowski M, Farkas R, Beyan O, Moll L, Hahn H, Kiessling F, et al. Implementation of eHealth and AI integrated diagnostics with multidisciplinary digitized data: are we ready from an international perspective? European Radiology. 2020; 30: 5510–5524.

[77] Foraker R, Mann DL, Payne PRO. Are Synthetic Data Derivatives the Future of Translational Medicine? JACC: Basic to Translational Science. 2018; 3: 716–718.

[78] Hollingsworth TD, Medley GF. Learning from multi-model comparisons: Collaboration leads to insights, but limitations remain. Epidemics. 2017; 18: 1–3.

[79] Király FJ, Mateen B, Sonabend R. NIPS-not even wrong? A systematic review of empirically complete demonstrations of algorithmic effectiveness in the machine learning and artificial intelligence literature. arXiv. 2018. (in press)

[80] Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. The TRIPOD Group. Circulation. 2015; 131: 211–219.

[81] Al-Shahi Salman R, Beller E, Kagan J, Hemminki E, Phillips RS, Savulescu J, et al. Increasing value and reducing waste in biomedical research regulation and management. The Lancet. 2014; 383: 176–185.

[82] Nor AKM, Pedapati SR, Muhammad M, Leiva V. Overview of explainable artificial intelligence for prognostic and health management of industrial assets based on preferred reporting items for systematic reviews and meta-analyses. Sensors. 2021; 21: 8020.

[83] Cortes C, Jackel LD, Chiang WP. Limits on learning machine accuracy imposed by data quality. In Tesauro G, Touretzky D, Leen T (eds.) Advances in Neural Information Processing Systems (pp. 239–246). MIT Press: MA, US. 1995.

Abstracted / indexed in

Science Citation Index Expanded (SciSearch) Created as SCI in 1964, Science Citation Index Expanded now indexes over 9,200 of the world’s most impactful journals across 178 scientific disciplines. More than 53 million records and 1.18 billion cited references date back from 1900 to present.

Journal Citation Reports/Science Edition Journal Citation Reports/Science Edition aims to evaluate a journal’s value from multiple perspectives including the journal impact factor, descriptive data about a journal’s open access content as well as contributing authors, and provide readers a transparent and publisher-neutral data & statistics information about the journal.

Chemical Abstracts Service Source Index The CAS Source Index (CASSI) Search Tool is an online resource that can quickly identify or confirm journal titles and abbreviations for publications indexed by CAS since 1907, including serial and non-serial scientific and technical publications.

Index Copernicus The Index Copernicus International (ICI) Journals database’s is an international indexation database of scientific journals. It covered international scientific journals which divided into general information, contents of individual issues, detailed bibliography (references) sections for every publication, as well as full texts of publications in the form of attached files (optional). For now, there are more than 58,000 scientific journals registered at ICI.

Geneva Foundation for Medical Education and Research The Geneva Foundation for Medical Education and Research (GFMER) is a non-profit organization established in 2002 and it works in close collaboration with the World Health Organization (WHO). The overall objectives of the Foundation are to promote and develop health education and research programs.

Scopus: CiteScore 1.0 (2022) Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 Inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-level subject fields: life sciences, social sciences, physical sciences and health sciences.

Embase Embase (often styled EMBASE for Excerpta Medica dataBASE), produced by Elsevier, is a biomedical and pharmacological database of published literature designed to support information managers and pharmacovigilance in complying with the regulatory requirements of a licensed drug.

Submission Turnaround Time