Using Big Data to Predict and Analyze Cooperation and Conflict


Given the tremendous humanitarian and economic costs associated with interstate conflict, there is a pressing need to better predict bilateral conflict and cooperation, as well as to understand the drivers of these events, so that the former may be mitigated and the latter strengthened. In this paper, we address these questions by applying statistical methods to the GDELT database, which contains information about 431,353,246 dyadic interactions that took place between the years of 1979-2015 and were reported by the multilingual global press. We use machine learning techniques, which take as input the occurrences and frequencies of diplomatic events, exhaustively categorized as: Consult, Make Public Statements, Appeal, Express Intent, Threaten, Disapprove, Approve, and Demand. These covariates, along with control variables intended to proxy each dyadic member’s socioeconomic, demographic, and geopolitical characteristics (including discretizations and dyad-level combinations thereof), are used to predict the occurrences and frequency of outcome events, categorized as: Engage in Diplomatic Cooperation, Engage in Material Cooperation, Yield, Investigate, Coerce, Assault, Fight, Reject, Exhibit Military Posture, Reduce Relations, and Provide Aid. These models allow us to identify the direction and magnitude of the effect of each type of diplomatic interaction on future outcome events, given a one-month lag. This approach allows segmentation of the data by arbitrary filters on the actors involved and time spans covered, thus allowing researchers to compare these effects on various subsets of the corpus and draw conclusions about how these effects differ across regions, times, and combinatorial pairings of country-level characteristics.

The Conflict Conference

Additional info to come.