This continues our series of student reflections and analysis authored by our research team.
Bias in Coding: How Precise Variables Lead to Unbiased Results
Stephanie Sorich
As monotonous as coding can sometimes feel, it’s also in these repetitive moments that I end up finding the cases that kick me out of my coding groove. It can feel like inputting meaningless values into a spreadsheet, until a value stops you in your tracks and makes you realize you were making assumptions about a case based upon the values combined. Alone, values are nothing more than units on a sheet; together, they spell out the story of a case.
Take, for instance, the case of Farhan Sheikh, a 19-year-old male arrested in Chicago for threatening to attack a women’s clinic in protest of abortion. The action itself is rightist in nature, however with a non-Germanic name, the narrative does not seem to align with the biases we are conventionally used to about crimes and who commits them. As I filled out Sheikh’s biographical information- name, case ID, and city and state of the crime, I had created in my mind an idea of what this case was about. It was based off of previous cases coded, so to a degree my assumptions were guided by some kind of logic. However, coders getting ahead of themselves poses a danger to the objective nature of coding each case. Gliding over one factor by making an assumption could cause vast misdirection about a case.
Of course, the idea of stereotyping a situation is nothing new. However, ways around coders’ preconceived notions of crimes and terror can be more complicated. Forcing a coder to break a case down into its smallest possible components compels us to break apart our own assumptions, and therefore notice the very minute details that make this project unique. Terror and crime are complicated; our understanding of a case can only be as complete as our ability to code it accurately.
Since I joined the project just a few short months ago, things have constantly changed. New variables for coding have been created (and old cases re-coded), and variables have grown to include more potential values for coders to select. Changes in coding variables can mean the difference in the type of crime created, how the defendant identifies him or herself, and how the government identifies the defendant: all pieces essential to understanding the facts of a case, and even more essential to analyzing it.
Putting a name to a crime (like Farhan Sheikh’s case) is on the simpler end of the spectrum, and I was still personally taken aback by the idea that it “didn’t fit.” Defining our variables means that I do not define a Christian defendant, the coding manual does; I do not define State Speech Act, the coding manual does; and having a consensus strong as such keeps each coder on the same path to a concrete understanding of how each case- no matter how different the circumstances- can be compared with one another.
My own goal going forward is to focus on each case one cell in the spreadsheet at a time. There are textbook cases, of course, but there will also continue to be cases that surprise. I also don’t want to fear the cases that may not fit the exact mould: in fact, I want to embrace them. Every case that causes us to pause and consider the way we as a team function only forces us to be better in our techniques and more critical of our own choices. At the early stage I’m in, the more critical I can be of myself and my participation, the better.
For more information on Farhan Sheikh’s case: https://www.washingtonpost.com/nation/2019/08/20/online-violent-threat-meme-site-chicago/