Skip to main content

Replicating Data Mining Techniques for Development: A Case Study of Corruption

Available here (external link, PDF).

MSc thesis which looks at how small civil society organisations can adapt big data for international development applications.

Also presented at the Earth Institute, Columbia University, New York, in 2013.


Data Mining has a reputation in social science for lacking statistical rigour. This study challenges this reputation and argues that, whilst such a method (as with any other) can be abused, it has particular promise as a tool to be used for monitoring and explorative research, especially by smaller development organisations. Drawing on recent advances in adapting commercial ‘Big Data’ techniques for use in international development, this study uses an example data set of global news reports to measure the level of discussion about corruption using a Text Mining methodology. The methodology outlined holds particular promise for tracking the dissemination of ideas and concepts, although it is heavily dependent on contextual interpretation and the quality of the data set used.