Data mining of ecological data to confirm exisiting, or generate new hypotheses on environmental risk factors for cancers of the brain and nervous system.


Session type:

Frank de Vocht1, Kimberly Hannam1, Iain Buchan1
1The University of Manchester, Manchester, UK


There is a public health need to balance timely generation of hypotheses with cautious causal inference. However, for for example rare cancers standard epidemiological study designs may not have been able to elucidate causal risk factors. Open-access online databases, which have been vastly underused in terms of data mining, can be used to investigate associations between risk factors and cancer incidence and/or prevalence. These can subsequently be used to evaluate existing, but disputed hypotheses, evaluate latency between exposure and clinical onset of cancers, and to generate new hypotheses at an ecological level for subsequent confirmation in epidemiological study designs that take a cautious approach to causal inference.


National age-adjusted incidence rates were obtained from the GLOBOCAN 2008 resource and combined with data from the United Nations Development Report and the World Bank list of Development Indicators. Data were analyzed using least-squares regression modelling.


2008 national incidence of cancers of the brain and nervous system was associated with the Continent where a country was located, the Gross National Income in 2008 and the Human Development Index Score. Surprisingly, the penetration rate of cellular subscriptions was the only risk factor consistently associated with higher incidence, although it explained only about 4% of variation between countries. Analyses further indicated that the minimal latency period to study this association in case-control or cohort studies should be 11-12 years, but ideally more than 20 years. Other potential risk factors were identified, and maybe further studies in confirmatory studies.


Readily available ecological data may be underused, particularly for the study of risk factors for rare cancers and those with long latency times. The results of ecological analyses in general should not be over interpreted in causal inference, but equally they should not be ignored where alternative signals of aetiology are lacking.