Scraping the Violence Project’s Mass Shooter Database (part 1 of 2)

The thoughtful Greg Reese from Miami’s Research Computing Support sent our team a link to a news story today. The email was titled, “Might have relevance to your work” and linked to a story by Vice News, “Nearly All Mass Shooters Since 1966 Have Had 4 Things in Common.”

This article presents a recent data set published by The Violence Project known as the Mass Shooter Database.

Greg was right to send this our way as there are obvious likely overlaps between tPP’s case criteria and the MSD’s. When reviewing Vice’s secondary review, they noted:

“[Mass shootings are] also increasingly motivated by racial, religious, or misogynist hatred, particularly the ones that occurred in the past five years.”

As soon as I saw this I decided to request access to the data set and promptly received a link. The meticulous and easy to navigate data set provided event data on 171 shootings. From there I sorted columns to prioritize certain variable values and trim the 171 events down to those which would likely meet the inclusion criteria for tPP.

I began by eliminating any shootings prior to 1990 as this is outside of tPP’s data range.
I then used the “on scene outcome” variable to remove all cases where the shooter died on scene, keeping only those in which the individual was apprehended. Since tPP requires the charging of a crime, only individuals who survived their attack could be included.
I then sorted by motive. The data set codes for 13 “grievances” and “motivations.” Using these criteria I colored all cases which displayed the following values:
- Racial element
- Interest in white supremacy/Notable racism/Xenophobia
- Religious hate
- Homophobia

I also included two cases coded as “Notable misogyny” as this is a recurrent trend in some cases we have added to our project. I then eliminated all of the cases which displayed other grievances as these would not likely meet our definition of a socio-political motive.

This process produced a final set of 13 cases which, according to my interpretations of the coding criteria as provided by The Violence Project, likely meet the criteria for inclusion in tPP. These cases will subsequently be assigned to coders to investigate, and eventually coded for inclusion or exclusion. The cases identified (prior to individual investigation) are:

Kenneth French
Colin Ferguson
Hastings Arthur Wise
Richard Baumhammers
Steven Stagner
Chai Vang
Dylann Roof
Arcan Cetin
Nikolas Cruz
Dimitrios Pagourtzis
Jarrod Ramos
Robert Bowers
Patrick Crusius

Our scraping procedure for data sets requires that we first check if an incident is already included in the project. This involves searching the final data set as well as a series of ‘in progress’ sheets managed by coding teams. If the case is already included, as is the case of the 4 defendants underlined, we will evaluate our coding choices based on the new data for triangulation and possible modification. Since the data provided by The Violence Project is more detailed in certain aspects, we may be able to more accurately represent the record within tPP by exploring the other researchers’ coding decisions.

This search yielded confirmatory information on 4 cases, and 9 likely new case starters. These 9 cases will be investigated by coding teams. They will be worked through the inclusion/exclusion decision tree, and if they pass, entered into the team workflow.

Leave a Reply Cancel reply