Student entries

Public Access to Data


This continues our series of student reflections and analysis authored by our research team.


Public Access to Data

Zion Miller

Two topics are in vogue now more than ever: terrorism and being data-driven. The Prosecution Project combines these two in a previously unseen way, creating a public database about illegal political violence prosecuted within the United States. This project is of critical importance to our modern political discussion where politicians and constituents are trying to make sense of the government’s actions to prevent terrorism of various kinds.

Consider Alexandria Ocasio-Cortez’s questioning of the FBI about the rise of white nationalism within the United States, or the perennial question of how to fight Islamic terrorism. Data on these issues is difficult to access for even the government, let alone the average American. That’s why the Prosecution Project is so important.

With over 2,000 entries in the database (at the time of writing, December 2019), tPP has one of the largest datasets available to the public. Collecting that many entries requires tPP coders to scour the internet, news reports, and court documents for cases in a process known as “scraping.” Coders will set up Google news reports to push notifications for stories including “terrorism,” “white nationalism,” “Islamic terror,” and other similar words to ensure that tPP receives the most current data. A Twitter account follows the DOJ, FBI, and other projects like ours to get their information. Coders also send in cases they stumble upon during their personal time that have somehow slipped through the net.

The process to file these cases is a team effort. In the Fall 2019 cohort, 22 people work together to find and archive information for coders to go over later assign variables. Two of those 22 people go over a source document where possible indictments could be gained, such as a DHS or FBI report, or look at another project’s database for cases we haven’t yet included. These two individuals take the names and locations for the offenders and place them into a spreadsheet. From there, another two people will check the names provided from the source documents and check them against our database to make sure they’re not already included. If the case is already included and the source offers no new information it is discarded. If the case is included but the source document offers new information, it’s sent off to a team of six coders who compare the data provided from the new source and update tPP’s database as needed. Finally, any new cases are sent to a team of eight coders who find additional documents to corroborate the initial source and file them in a collection for later addition to the database.

This data can be accessed from a number of different people. Government actors, educational institutions, research groups, and private individuals all have equal access. That equal access is important. In today’s world, it is becoming increasingly difficult to easily access unbiased information without a paywall, especially on a topic like political violence. Already, this information is leading to important publications in academic journals and books. It is my hope that in the future, people will use this database to publish analysis that is easily understood by the American public so that a greater understanding of the types of political violence that are committed and a greater understanding of how equitable the prosecution of that violence is.

Leave a Reply

Your email address will not be published. Required fields are marked *