Constitutions enshrine the basic rules governing societies. Comparing constitutions can therefore reveal much about the values and institutions that countries have in common and that distinguish them. But, with more than 200 constitutions existing around the world, it is challenging to get a truly global perspective of the constitutional landscape.
Fortunately, there is legal data science. By treating constitutions as data, we can investigate more of them in less time. In this series of blog posts, we will investigate aspects of the world’s constitutions through a data science lens. Our goal is to showcase how a few lines of programming code can mine all the world’s constitutions in a matter of seconds to reveal interesting patterns and novel trends.
In this introductory blog post, we will provide a brief overview of data-driven comparative constitutional scholarship and introduce our dataset. Subsequent posts will investigate (I) during which era most constitutions were adopted, explore different approaches constitutions use to (II) regulate firearms or (III) protect against sex and gender discrimination, and (IV) trace the legacy of colonialism in what was formerly the British-Caribbean.
Data-driven comparative constitutional law
Comparative constitutional law research investigates differences and similarities across the world’s constitutions. It assesses trends in constitutional design from the 1215 Magna Charta to the 2019 Sudanese constitution and studies how constitutional norms are transplanted across the globe.
Until recently, comparative constitutional analysis was not scalable. Legal scholars tended to focus on smaller subsets of constitutions comparing, say constitutional law in common law countries, or studied specific aspects of constitutions, say the protection of human rights. Political scientists and empirically inclined legal scholars have furthermore embarked on efforts to hand-code constitutional features. Both approaches are time and labour intensive.
The emergence of legal data science now offers an opportunity to complement this detail-oriented scholarship and render constitutional analysis scalable. So called regular expressions allow the extraction of specific pattern across constitutions, quantitative text similarity reveals how constitutions differ and machine learning algorithms can scale and even automate the mapping of content features.
Some scholars have begun to leverage data science for the study of constitutions. Robert Shaffer, for example, used natural language processing to measure the level of executive discretion offered by national constitutions. In another example, David Law relied on a machine learning technique called topic modelling, which we cover in Lesson 7, to track the imprint of international human rights conventions on the text of constitutions. This line of research, however, is still in its infancy. That is why we decided to showcase further applications as part of this blog post mini-series.
The Constituteproject.org dataset
A major obstacle for computational legal research generally is the absence of data. Fortunately, the excellent work of the www.constituteproject.org, spearheaded by Zachary Elkins, James Melton and Tom Ginsburg, provides easy access to the full text of the world’s constitutions.
We retrieved the HTML versions of each constitution from the www.constituteproject.org and then saved them as .txt files. File names were created based on the website’s document titles, which indicates the country name as well as the constitution’s date of publication, date of reinstatement (if applicable) and date of last revision (if applicable). For simplicity, spaces, brackets, and punctuation marks were replaced with underscores (e.g. “Guinea_2010.txt”, “Honduras_1982_rev_2013.txt”). Using the code introduced in Lesson 2, we uploaded the txt files of close to 200 constitutions into R.
#
#We use the "readtext" package to load txt files into R in bulk.
install.packages(readtext)
library(readtext)
#Load the text data into a dataframe
constitutions<-"~/Legal Data/Constitutions"
constitution_texts<-readtext(constitutions)
#
Due to copyright restrictions, we unfortunately cannot share the constitutions’ full texts. But by downloading the texts in HTML from the www.constituteproject.org and following the above-described file naming conventions anyone can replicate the analysis of the global constitutional landscape in the posts that follow.
Let’s get started. The first substantive post of the series highlights how we can use meta-data to track constitution-making over time.
Constitutions R Text-as-Data