Relationships between Demographic, Geographic, and Environmental Statistics and the Spread of Novel Coronavirus Disease (COVID-19) in Italy

Background: Since January 2020, the coronavirus disease 2019 (COVID-19) pandemic has raged around the world, causing nearly a million deaths and hundreds of severe economic crises. In this scenario, Italy has been one of the most affected countries. Objective: This study investigated significant correlations between COVID-19 cases and demographic, geographical, and environmental statistics of each Italian region from February 26 to August 12, 2020. We further investigated the link between the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and particulate matter (PM) 2.5 and 10 concentrations before the lockdown in Lombardy. Methods: All demographic data were obtained from the AdminStat Italia website, and geographic data were from the Il Meteo website. The collection frequency was one week. Data on PM2.5 and PM10 average daily concentrations were collected from previously published articles. We used Pearson’s coefficients to correlate the quantities that followed a normal distribution, and Spearman’s coefficient to correlate quantities that did not follow a normal distribution. Results: We found significant strong correlations between COVID-19 cases and population number in 60.0% of the regions. We also found a significant strong correlation between the spread of SARS-CoV-2 in the various regions and their latitude, and with the historical averages (last 30 years) of their minimum temperatures. We identified a significant strong correlation between the number of COVID-19 cases until August 12 and the average daily concentrations of PM2.5 in Lombardy until February 29, 2020. No significant correlation with PM10 was found in the same long periods. However, we found that 40 μg/m^3 for PM2.5 and 50 μg/m^3 for PM10 are plausible thresholds beyond which particulate pollution clearly favors the spread of SARS-CoV-2. Conclusion: Since SARS-CoV-2 is correlated with historical minimum temperatures and PM10 and 2.5, health authorities are urged to monitor pollution levels and to invest in precautions for the arrival of autumn. Furthermore, we suggest creating awareness campaigns for the recirculation of air in enclosed places and to avoid exposure to the cold.


Introduction
The Chief of the World Health Organization (WHO) has declared the coronavirus disease 2019 (COVID-19) pandemic the most severe pandemic in recent human history [1]. To date, over 200 countries have been involved, with over 29 million cases and 900,000 deaths [2]. Between February and April 2020, Italy was the most affected nation in terms of the number of new cases and deaths [3]. Despite a drastic drop in infections during the summer months, it is still among the top 20 nations afflicted by the novel coronavirus [4]. On January 23, two Chinese tourists tested positive for COVID-19 near Rome [5]. However, the patients appeared to have been readily isolated, averting extensive contagion. Toward the end of February, the situation accelerated and fell outside the control of the authorities. From February 21, to counter the spread of the virus, the Italian government declared various lockdowns that extended around the outbreak of Lodi, in the Lombardy region. On March 10, the lockdown went into effect nationwide [6]. For these reasons, we believe Italy is one of the main sources of information for fully understanding the behavior of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
To the best of our knowledge, this is the first study to provide a complete and detailed history of the correlations between the spread of SARS-CoV-2 and demographic, geographic, and environmental statistics in Italy. Since SARS-CoV-2 has shown very different virological and epidemiological characteristics depending on the region [7], from the analysis of our results it was possible to highlight and mathematically quantify anomalous and/or local properties and behaviors as well as test the statistical significance of the hypotheses and scenarios proposed by other researches.

Materials And Methods
We collected data on Italian demographic statistics and pollution from the AdminStat Italia website and a previously published article [7,8]. We looked for significant Pearson (R) and Spearman (r) correlations between COVID-19 cases per province in each region from February 26 until August 12, 2020, and birth rate, median age, population density, death rate, old age index, population number, percentage of unmarried people, family members, growth rate, percentage of divorcees, percentage of foreigners, foreign growth rate, and percentage of widowers. The frequency of data collection was one week. For each correlation identified, we calculated the angular coefficient of the interpolating line and correlated the latter with geographical characteristics such as regions' latitude and minimum temperatures in February and March (last 30 years of historical data). We collected geographic data from the Il Meteo website [9]. The survey was conducted in the following regions: Abruzzo, Calabria, Campania, Emilia-Romagna, Friuli Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Piemonte, Sardegna, Sicilia, Toscana, Veneto (number of provinces >3). The week in which the correlation approached the threshold of statistical significance is reported in parentheses, for example, Abruzzo (4). Finally, we have deepened the results of a previous study on the search for correlations between the daily average concentrations of particulate matter (PM) 10 and 2.5 until February 29 and COVID-19 cases for every province in the Lombardy region from February 26 to August 12, 2020 [7]. The temporal discrepancy between the two detection periods is due both to the fact that COVID-19 can take up to 14 days to manifest symptoms and to our intention to test for long-term effects of PM [2,7].

Statistical analysis
Each result was reported together with its standard deviation (SD) and p-value (p); we chose a statistical significance threshold of α≤.05. For each sample, kurtosis (k) and skewness (s) were calculated using Microsoft Excel 2020 software. We used the formulas 24/n and 6/n, with n sample size, to obtain their respective standard deviations [10]. We considered data compatible with a normal distribution only when tk=k•(24/n)^(-1/2) ≤ 1.5 and ts=s•(6/n)^(-1/2) ≤ 1.5; here, the Pearson correlation index was deemed more reliable. When tk ∈ ]1.5,3], ts ≤ 3 or tk ≤ 3, ts ∈ ]1.5,3], we considered it appropriate to evaluate both correlations. Finally, when tk, ts >3, we judged the Spearman correlation index to be more appropriate. When we highlighted linear correlations, we used the Igor Pro 6.37 software ((Wavemetrics, Lake Oswego, OR, USA) to interpolate the data through the equation y=a+bx. In the results section, we report the mean values R*,r* of the correlations identified, with their 95% confidence intervals. When the coefficients exceeded the value .700 with p≫.05, they were reported to specify the absence of statistical significance. To verify the importance of the correlation, we constructed a suitable correlation matrix. We have defined pure correlations independent from other quantities correlated with COVID-19.

TABLE 1: Pearson and Spearman correlations between COVID-19 cases and population density and between COVID-19 cases and population numbers.
cov = COVID-19 cases, tk = ratio between kurtosis and standard deviation of kurtosis, ts = ratio between skewness and standard deviation of skewness, R = Pearson's correlation value, r= Spearman's correlation value, p= p-value.

Speed of spread of COVID-19: correlation with latitude and minimum temperatures
We found a significant strong correlation between the angular coefficients b of the various regions and their latitude (

Correlation between COVID-19 cases and PM10 and PM2.5 daily averages (Lombardy only)
We identified a strong significant correlation between the number of COVID-19 cases until August 12 and the average daily concentrations of PM2.5 in Lombardy until February 29, 2020 (r=.76, p=.004). We found no significant correlation with PM10 in the same periods. Therefore, in the long run, the correlation with PM2.5 was more statistically incisive than that with PM10. However, in the early stages of the outbreak (until February 26, 2020), we found a correlation with PM2.5 (r=.63, p=.029) and PM10 (r=.72, p=.009). In the second week of March, the correlation with PM10 disappeared, while that with PM2.5 continued to exist until now. We identified a drastic surge in COVID-19 cases near 40 μg/m^3 PM2.5 and 50 μg/m^3 PM10; therefore, these may be thresholds beyond which particulate pollution clearly favors the spread of SARS-CoV-2 ( Figure 1). All the correlations found are statistically valid as they are not related to the other quantities analyzed.

Other minor correlations
In 20% of the regions, we found significant correlations between COVID-19 cases and birth rate, growth rate, and percentage of unmarried; among these, those not related to correlations with demographic density and population number were only 13.33% (Table 3). We found four correlations between COVID-19 cases and median age (26.67%). However, three of these were positive while one was negative. The most particular case is that of Marche region, where we identified significant negative correlations with median age, death rate, and old age index, and positive correlations with the percentage of unmarried and divorces. Since these data have no national relevance, we suggest using them only as statistical indicators of possible regional-local phenomena related to the spread of SARS-CoV-2.

Plausible scenarios
The results are compatible with the following scenarios: 1. The almost immediate correlations between COVID-19 cases and the number of inhabitants per province in Campania, Lazio, Sicily, Liguria, and Piedmont strongly indicate that SARS-CoV-2 was in circulation for a long time before the first confirmed case. On the contrary, observing the trends of Lombardy and Veneto, we deduced that the virus seems to have taken four weeks to correlate with their demographic dimension. It is therefore plausible that COVID-19 spread in Italy from January 2020 (or earlier) while Veneto and Lombardy regions have experienced a more gradual contagion.
2. Low temperatures can weaken the immune defenses, favoring the contagion from SARS-CoV-2.
3. Low temperatures can push people to have gatherings indoors and without air circulation, favoring the spread of the novel coronavirus.
4. Low temperatures could promote the survival of the virus.
6. PM2.5 can serve as an excellent carrier for the novel coronavirus.
7. High concentrations of PM10 may have contributed to the spread of the virus by acting as a carrier. This is supported not only by the initial correlation with PM10 but also by the extremely high PM10 concentration found in the first outbreak in Lodi (67 μg/m^3).
8. It is possible that SARS-CoV-2 has undergone evolutionary mutations in northern Italy (particularly in Lombardy). This agrees with the possibility that COVID-19 spread nationwide, mistaken for flu or common colds until the aforementioned mutation occurred.

Discussion
The first two confirmed cases of COVID-19 in Italy were two Chinese tourists who landed on January 19, 2020 [5]. They could travel freely around the town, making a brief stop in Parma and staying in a hotel in Rome [11]. It is statistically unlikely and morally incorrect to associate them with the arrival of the novel coronavirus in Italy. In particular, the results of the present study suggest that SARS-CoV-2 has been circulating in Italy since early January, probably mistaken for flu or the common cold. In fact, having ascertained that those suffering from COVID-19 were largely asymptomatic [12], symptoms associated with COVID-19 are milder in children compared with adults [13], the incubation period ranges from two to 14 days with an average of 5-6 days [14], 80% of people with COVID-19 have mild symptoms [15], average mortality is about 1% [16,17], and the fatality rate is proportional to age and extremely high for patients over 65 years [18], the most likely hypothesis is that the virus had been present in Italy before their arrival. All these factors drastically increase the time needed to identify and isolate a person infected with COVID-19 since: a) the asymptomatic individuals infected without their knowledge and the people with whom they have been in contact, b) children (besides the problem of asymptomaticity) showed milder symptoms than adults, inducing parents and relatives to associate them with normal colds or flu, c) the symptoms were identified up to 14 days after contagion, allowing infected subjects to infect other people unknowingly, d) in the overwhelming majority of cases, symptomatic patients showed mild symptoms not causing concern in work colleagues and relatives, and e) the fact that the death risk was extremely concentrated in the high age groups prevented an easy assessment of the extent of the epidemic since these groups were naturally more exposed to fatal phenomena, that is, until high numbers have been reached, the collective psychological impact was low.
The marked correlation between the speed of the spread of the virus among the population and the minimum temperatures of each region suggests that the late autumn and winter seasons can strongly favor the pandemic. In fact, due to low external temperatures, people more frequently have indoor gatherings without recirculation of air, which creates favorable conditions for the proliferation of viruses [19]. Just as rhinoviruses, adenoviruses, and influenza viruses, the novel coronavirus may also survive better in colder and drier climates [20][21][22][23]. Furthermore, sudden changes in temperature and cold can lower immune defenses [22]. The strong and prolonged correlations we found with fine PM2.5 in the Lombardy region indicate that this type of pollution played an important role in the epidemic. This may be linked to the fact that fine particles substantially reduce immune defenses as well as increase the severity of symptoms due to damage induced in the respiratory system [24,25]. Moreover, PM2.5 (and PM10) could act as a virus carrier [26]. In fact, from Figure 1, it can be seen that there is a drastic increase in the number of COVID-19 cases when PM2.5 and PM10 exceed approximately 40 μg/m^3 and 50 μg/m^3, respectively. These could be the thresholds beyond which particulate matter significantly favors the spread of SARS-CoV-2. In addition to empirical data and other studies, this hypothesis is consistent with the first outbreak in Lodi, where the PM10 average daily concentration was the highest of the month (67 μg/m^3) [7,27]. However, our results show a greater incisiveness of PM2.5 compared to PM10 in the spread of the virus in Lombardy (Figure 1).
Finally, considering that in Lombardy, the correlation between COVID-19 cases and the number of inhabitants per region became significant after four weeks, the severity of symptoms was more severe than in other regions, and the basic reproduction number (R0) seems to have been the highest, we suggest that an evolutionary genetic mutation may have occurred in Lombardy [7]. In fact, although the genome does not appear to have changed substantially, Zhang et al. showed that even small mutations could cause significant changes in SARS-CoV-2 behavior [28].

Limitations
Statistical correlations can provide valid support for hypotheses and theories as well as fundamental indicators of phenomena to be explored; however, they cannot demonstrate the causal nature of a phenomenon.

Conclusions
We found significant strong and lasting correlations between the spread of SARS-CoV-2 and the number of inhabitants of each region, between the spread-speed of COVID-19 and the historical minimum temperatures in the months of February and March, and between the number of COVID-19 cases and the average daily concentrations of fine PM2.5. Correlations with the average daily concentrations of PM10 were found up to the first week of March, indicating that this type of pollution also played a role in the spread of the virus, linked to exceeding specific daily peaks. Therefore, we suggest that the health authorities pay particular attention to the arrival of the winter months, not only by investing in adequate precautions but also by launching various awareness campaigns on air recirculation indoors and in avoiding exposure to the cold. In addition, it will also be necessary to carefully monitor levels of PM10 and 2.5, even by limiting the use of cars if necessary.

Additional Information Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.