EECS 298: Social Consequences of Computing

Homework 3: To Predict and Serve

Due 11:59 PM EST on April 11

Coding Submission: 40 points

Written Submission: 20 points

Total Points: 60 points

Submission

This assignment consists of two parts:

Programming - submit HW3.py
Written Reflection - submit a PDF with your responses

Both parts will be submitted on Gradescope. Part 1 will be submitted to Homework 3: Coding Submission and Part 2 will be submitted to Homework 3: Written Submission. To access Gradescope, use the link on Canvas.

Part 1 will be graded using an autograder, so you'll be able to get feedback as soon as you submit - you can submit any number of times until you feel happy with your score! Your code will be tested on private cases in addition to the public cases you are given, and as such, your code should be properly generalizable to other, similar calculations. Your programming implementation will be graded on correctness. We encourage collaboration, but all work you submit must be your own.

Part 2 will be graded manually, but you can still resubmit as many times as you need to before the deadline. It is required that you typeset your written responses in a document editor or a program like LaTeX

All writing must be your own, and collaboration must not result in code or writing that is identifiably similar to other solutions.

Introduction to Predictive Policing

Predictive policing refers to the use of data analysis and machine learning techniques to identify patterns and make predictions about future criminal activity. Typical stated goals of predictive policing include allocating police resources more effectively and efficiently, reducing crime rates, and improving public safety. Predictive policing has seen increasingly widespread use in the US [1] [2]. Perhaps the predictive policing algorithm that has recieved the most media coverage is an algorithm called PredPol, developed by a private company (formerly called PredPol, then called Geolitica, now absorbed into SoundThinking (which was formally ShotSpotter) and part of a similar product ResourceRouter) [3].

PredPol is software developed by social scientists in collaboration with the Los Angeles Police Department (LAPD). It uses historical arrest data to predict the probability of future arrests occurring in specific areas using a machine learning model based on earthquake prediction called Epidemic Type Aftershock-Sequences (ETAS) [4]. The algorithm divides a city into grid cells as small as 500x500 feet and assigns a risk score to each cell which is the probability of an arrest occuring in that cell, which PredPol assumes is a proxy for a crime occuring in that cell. The idea is to then use this information to increase the presence of officers to those cells that have the highest risk of a crime. However, critics argue that flawed and systemically biased data results in racially discriminatory predictions and policing, where the use of such algorithms can help produce the very same flawed data that then gets fed back into these systems. This leads to a predictive policing system that reinforces patterns of over-policing, creating feedback loops that create a cycle of oppression [2] [5] [6].

In this assignment, we will investigate these claims by using real arrest data from the city of Oakland, California from 2009 to 2011. For cells, we will use census tracts, units of area that the US Census uses to collect population totals. These are convenient to use because it breaks up the city of Oakland into reasonably sized pieces, and the US Census conveniently collects lots of demographic data on the people who live in each tract. Using this data, we will investigate the racial distribution of the people arrested in this data and and the people who could be affected by the use of the PredPol algorithm on this data. We will also investigate what happens when assigning additional police to a tract results in additional arrests, which then get fed back into the model. We will loosely be following the original analysis of Lum and Isaac [6].

Datasets

For this assignment, you are provided three datasets as described above. For your implementations of different classes and functions, you will read in the data from CSV files. Descriptions of each of these files are below and the HW3.py section will describe how to read in each dataset.

Arrest Data

You are provided real data collected from the Oakland police department about arrests in the city in the form of arrests.csv. This dataset contains details on drug-related arrests in Oakland from 2009 to 2011. The columns include information such as the description of the incident and the location of the arrest. Use wget to get the file arrests.csv.

wget https://raw.githubusercontent.com/eecs298/eecs298.github.io/main/files/arrests.csv

The columns of the data are as follows:

Date: The date of the arrest formatted as YYMMDD.
Category: The category of the drug-realted arrest.
Desc: The (brief) description of the crime committed.
Addr: The address where the arrest was made.
Lat: The latitude where the arrest was made.
Long: The longitude where the arrest was made.
Tract: The census tract where the arrest occured formatted as 6 digits. Ex: 402600.

Demographic Data

You are also provided racial demographic data from the 2010 US Census for each tract where the Oakland police can make arrests in 2010_Oakland_Tract_Demographics.csv (as estimated by this dataset and Lum and Isaac[6]). This file has a column for the tract and the number of people living in that tract from each racial category the US Census collects. These racial categories are dictated by the US Census Bureau and are measured via people self reporting on the US Census. You may assume that the list of unique tracts in arrests.csv is the same as the list of tracts in this file. Use wget to get the file 2010_Oakland_Tract_Demographics.csv.

wget https://raw.githubusercontent.com/eecs298/eecs298.github.io/main/files/2010_Oakland_Tract_Demographics.csv

The first column of 2010_Oakland_Tract_Demographics.csv is Tract and specifies the census tract that the population numbers for each demomgraphic of a single row correspond to.

NOTE: Tract is formatted differently here than in arrests.csv and implementation details are given below for how to resolve this difference

The remaining columns of the data correspond to demographic data of the tracct and are as follows:

Total population
Hispanic or Latino
Total population, not Hispanic or Latino
One race total
White
Black or African American
American Indian and Alaska Native
Asian
Native Hawaiian and Other Pacific Islander
Some Other Race
Two or More Races

Drug Use Data

Finally, you are provided with rates of drug use, broken down by demographic category in 2010_drug_use.csv. This data is from the National Survey on Drug Use and Health (NSDUH), and lists the percentage of people belonging to each demographic category who respond to the survey that they participated in illicit drug use in the last month (among persons aged 12 or older). The survey is from 2010. This will serve as a proxy for the ground truth, given that an anonymous survey conducted using careful sampling techniques will undoubtedly be better than arrest data in measuring illicit drug use. We make the assumption that the national drug use rates are the same in the tracts in Oakland. Use wget to get the file 2010_drug_use.csv.

wget https://raw.githubusercontent.com/eecs298/eecs298.github.io/main/files/2010_drug_use.csv

This file consists of only two rows: the CSV header and the drug use percentages for the following categories:

TOTAL
AGE 12-17
AGE 18-25
AGE 26 or Older
Male
Female
Not Hispanic or Latino
White
Black or African American
American Indian or Alaska Native
Native Hawaiian or Other Pacific Islander
Asian
Two or More Races
Hispanic or Latino

Part 1 - HW3.py

Use wget to download the starter file and PredPol model file.

wget https://raw.githubusercontent.com/eecs298/eecs298.github.io/main/homeworks/HW3.py
wget https://raw.githubusercontent.com/eecs298/eecs298.github.io/main/homeworks/pred_pol.py

There are two main classes you will implement: DataWrapper and ProbabilityAnalysis. DataWrapper will handle reading in and processing the data from the csv files so that it may be used in the PredPol model and for the probability analysis. ProbabilityAnalysis will use the PredPol model and DataWrapper to compute various probabilities and expectations as explained in the ProbabilityAnalysis section.

In the pred_pol.py file, you will find a function and a class that you should not change. Information for how to use each is given below:

generate_counterfactual : Generates counterfactual arrest data for each of the given tracts at a given time step; see the analysis section for more on where we need to use this.
- Inputs: A DataWrapper instance and a list of tracts to generate counterfactuals for at the given time step t.
- Returns: A dictionary of counterfactually generated arrest numbers (values) for each given tract (keys).
PredPol : This class represents the PredPol model.
- In the constructor, the model is trained by default using the function train_model to build the model from training arrest data given by a DataWrapper.
- predict: This function is used to predict the probability of a crime occuring at a given time step for a given tract. Note that PredPol, even after the model is finished training, also needs the timestamps of all previous arrests to make predictions. Be sure to keep in mind this difference between what this function predicts and what the inputs are. (For those of you who have seen models like SVM, just as SVM makes predictions based on the support vectors, this model makes predictions based on the timestamps). To get a prediction for the likelihood of a crime occurring, pass in the following:
  - The tract to make a prediction for.
  - The timestep t to make a prediction for.
  - All timesteps (in the same structure as DataWrapper.timesteps) previous to t.

In the HW3.py file, you will find the DataWrapper and ProbabilityAnalysis classes for you to implement. Each class includes constructors and member functions that will be useful for implementing further functions and completing the analysis questions. Information for how to implement the each class is given below.

`DataWrapper`

This class contains all of the data that we need, including the census tracts, the demographic data for each tract, the arrest data for each tract, and drug use data in the population. Details for each of the functions you will write are below.

__init__ : Construct all of the attributes of the DataWrapper using the passed in arguments or the other functions in the class.
- Attributes:
  - arrests_path: The file location of the arrest data.
  - demo_path: The file location of the demographic data.
  - drug_path: The file location of the drug use data.
  - training_percentage: The percentage of days that will be used for training a predictive model.
  - tracts: The list of tracts in the arrest data. Each tract should be stored as a string. Construct this with build_tracts().
  - demographics: Stores the demographic data for each tract. Construct this with process_demo_data().
  - drug_use: Stores the drug use data for different groups of the population. Construct this with process_drug_data().
  - num_days: Total number of days in the arrest data (including the first and last day). Compute and modify this value in process_arrests()
build_tracts : Create a list of all unique tracts in arrests.csv to store in the self.tracts attribute. Each tract should be stored as a 6 digit string.
- Returns: a list of all unique tracts in arrests.csv.
process_demo_data : Create a dictionary that stores the demographic data from 2010_Oakland_Tract_Demographics.csv for each demographic category in the header of the csv file.
- Returns: a dict whose keys are each tract in self.tracts and the values are dictionaries mapping each given category to their population numbers in that tract.

TIP: See the note about how tract is stored in 2010_Oakland_Tract_Demographics.csv. It may be helpful to create an inner function here to extract the tract as the 6 digit string since this is how tracts are stored in self.tracts (i.e., add "00" to the end of the 4 digit version). For example, a tract written as 4053.01 and should be extracted as 405301 instead.

Use encoding='utf-8-sig' when you open the file (passed in as a keyword argument) if you run into issues.

process_arrests : Construct a dictionary for storing arrest data in arrests.csv. The keys of the dictionary are the tracts in self.tracts and the values are other dictionaries whose keys are date timestamps in the range [0, self.num_days - 1] and values are the count of arrests made on that day.
- Modifies: Compute self.num_days by counting the number of days between the first and last date in arrests.csv, inclusive. The dictionaries will be of length self.num_days, where each key represents a date between the first and last arrest.
- Returns: The dictionary of arrest data. For example, the dictionary will look something like the below example for tract 402600. This example shows that tract 402600 has 0 arrests on days 0 and 1, and 1 arrest on days 10 and 47.

{"402600": {0: 0,
            1: 0,
            ...
            10: 1,
            ...
            47: 1,
            ...}
}

TIP: It will help to use a library to process the dates and convert them to timestamps, such as the datetime library.

process_drug_data: Create a dictionary to store illicit drug use rates from 2010_drug_use.csv for each demographic category in the header of the csv file.
- Returns: A dict whose keys are the given racial categories and the values are the drug use rates stored as floats between 0 and 1.
split_arrests_log : Split the arrests data log into a training set, for training the PredPol model, and a test set for evaluating the model in the analysis section. Both the training data and the testing data should be in the same format as self.arrests_log. The training data should consist of only days in the first self.training_percentage (rounded down) of days in self.num_days and the testing data should countain the rest of the days. split_arrests_log should filter out the days with 0 arrests in a given tract.
- Returns: A tuple of two dictionaries (training_arrests_log, testing_arrests_log) each in the same format as self.arrests_log. For example, if self.num_days = 14 total days of arrests in the data and self.training_percentage = 2/3, then the training data, for each tract in self.tracts, should consist of days up through day floor(2/3 * 14) = 9 and the testing data should consist of t=10 through t=13 (this is the last day since the first datyt is 0-indexed).

TIP: You might find the dictionary get() method useful for filtering the arrests logs.

`ProbabilityAnalysis`

This class contains functions to compute various probabilities and expectations using the DataWrapper class and the PredPol model. We will use the following random variables throughout the class implementation details and in the following analysis.

A denotes which of the four racial categories a given person in Oakland belongs to.
Y denotes whether the Oakland resident has used illicit drugs in the last month (Y=1) or not (Y=0).
R denotes the tract that the arrest was made in (i.e., one of the tracts in DataWrapper.tracts).
H_t denotes whether a tract experiences a heightened police prescence at time t (we will define heightened police presence below).
- To make this a binary variable, we introduce H which is an indicator for whether a person will ever face a heightened police presence in their tract. Note:

H = 1 if sum_t H_t > 0, and H = 0 otherwise

Implementation details for each of the functions you will write are below.

__init__: Construct all of the attributes of the ProbabilityAnalysis instance using the passed in arguments and set self.pred_pol to be equal to an instance of the PredPol class with self.dw passed in.
- Attributes:
  - dw: An instance of a DataWrapper object to perform probability calculations on.
  - racial_categories: Defines the categories of the sensitive attribute: race.
  - pred_pol_model: An instance of PredPol, trained on the input DataWrapper object.
compute_P_A: Compute P(A) for each category in self.racial_categories, that is the proportion of each race in Oakland (across all tracts).
- Returns: A dict whose keys are the racial categories and the values are P(A=a).
compute_P_Y_eq_1_given_A: Compute the drug use percentage of each race, i.e., P(Y=1|A=a) for each race in self.racial_categories.
- Returns: A dict whose keys are the racial categories and the values are P(Y=1|A=a).
compute_P_A_given_R: Compute the proportion of each race in each tract, i.e., P(A=a|R=r).
- Returns: A dict whose keys are the tracts and the values are dicts whose keys are each racial category and each value is P(A=a|R=r).
compute_expected_arrests_given_A: Compute the expected number of arrests each racial category in self.racial_category will have. We assume that whenever an arrest is made in tract r, a uniformly random person is arrested from r: Everyone is equally likely in that tract to get arrested.
- Returns: A dict whose keys are the racial categories and the values are the expected number of times people in that category were arrested, as floats.

TIP: You can calculate the expected number of times a person of each racial category was arrested for each given arrest made (this number will be no more than 1) and then add up over all arrests, because for any (independent or dependent!) random variables X and Y, E[X+Y]=E[X]+E[Y]. Further, the expected number of times a person of race a was arrested for a single given arrest in tract r is exactly P(A=a|R=r) since we assume each person is arrested uniformly at random from the population in the tract.

compute_P_R_given_A: Compute the probability of being in a certain tract given a racial category, i.e., P(R=r|A=a).
- Returns: A dict whose keys are the racial categories and the values are dicts whose keys are each tract and each value is P(R=r|A=a).
update_previous_observations: Used in ProbabilityAnalysis.compute_P_H_eq_1_given_A() (see below) to update the dataset fed into PredPol.predict.
- Arguments:
  - previous_observations: The dictionary of arrests data that is to be updated. This dictionary should initially contain only the training_arrest_log when this function is called for the first time.
  - test_set: The ground truth testing_arrest_log found after running dw.split_arrests_log().
  - t: The current timestep in the range of testing data.
  - top_tracts: Default None. A dictionary with keys of tracts and values of predictions from PredPol.predict(). Contains only num_top_tracts as determined in ProbabilityAnalysis.compute_P_H_eq_1_given_A().
  - counterfactuals: Default False. A boolean to indicate whether true test data is fed into future timesteps or counterfactuals.
- Returns: An updated previous_observations constructed using the following steps:
- If counterfactuals = True, call generate_counterfactuals to generate arrest numbers for the top_tracts to add to timestamps passed into the predict function. This function generates arrest numbers under the assumption that we actually do send a heightened police presence at each timestep. See pred_pol.py for the implementation details and inputs to generate_counterfactuals.
- Example: If generate_counterfactuals predicts 5 arrests in a certain tract at time t, then you should set previous_observations[tract][t] equal to 5, so you can pass in an updated previous_observations dictionary at the next timestep to the predict function.
- For the rest of the tracts or if counterfactuals = False, add the ground truth arrest data for the current timestep t from the test_set to previous_observations. Refer back to DataWrapper.process_arrests() and DataWrapper.split_arrests_log() for details.
compute_P_H_eq_1_given_A: Using self.pred_pol, compute the probability that a person of each racial category experiences a heightened police presence, i.e., P(H=1|A). The PredPol algorithm outputs its belief of the probability of a crime occuring for the given tract and timestamp. Assume we will send a heightened police presence to num_top_tracts tracts at each test timestamp. That is, we will choose the top num_top_tracts tracts with the highest probability of crimes occurring in that timestep after calling self.pred_pol.predict on each tract in self.dw.tracts.
- Arguments:
  - num_top_tracts: An integer to determine the number of tracts there is a heightened police presence after each crime prediction.
  - counterfactuals: A boolean to indicate whether true test data is fed into future timesteps or counterfactuals.
  - track_odds_ratio: A boolean to indicate whether you should track odds ratios instead. See analysis question 7 for details.
- Returns: A dict whose keys are the racial categories and the values are P(H=1|A=a). The calculation of this probability requires the usage of PredPol to simulate police activity. Steps to implement this function and help you compute P(H=1|A=a) are given below. See theHint for additional help computing this probability after these steps.
- Split the data from self.dw into training and testing arrest logs.
- For each timestep t in the range of the testing arrest log (the same range found in DataWrapper.split_arrests_log), you will do the following three things:
  1. Get a prediction for the probability of crimes occuring in each tract in self.dw.tracts at t using self.pred_pol.predict. Remember to feed in all arrest data prior to t into the predict function. For example, for the first timestep, this will be training_arrest_log and you will add future arrest data for future values of t (see the third thing to do). See pred_pol.py for implementation details and inputs to self.pred_pol.predict.
  2. Find the top num_top_tracts tracts in terms of the highest probability of crimes occuring. Make sure to mark each of these tracts as having recieve a heightened police presence, i.e., H=1 for these tracts!
  3. Update the arrest data to feed into the predict function on the next iteration using ProbabilityAnalysis.update_previous_observations().
TIP: P[H|A] = sum_r P[H,R=r|A] and then use the definition of conditional probability to write P[H,R=r|A] in terms of P[H|R=r,A] and P[R=r|A]. The first probability can be calculated from the arrest data and the second from the previous function implementation.

You will notice an argument track_odds_ratio set to False by default. This should only be set to True for analysis question 7 to help you plot the data for that question. You will come back to edit this function when you get to question 7 and follow the instructions given there for implementation.

Analysis Questions

Answer each of the questions below as written responses and use the __main__ branch of the file to use the above classes/functions to help you find the answers. As a reminder, for Autograder questions, we will be grading your function implementations directly, so feel free to use the __main__ branch however you'd like. Use the default training_percentage of 2/3 in your DataWrapper instance.

For the sake of simplicity, this analysis will focus only on racial categories, and only four racial categories (which you will pass into the ProbabilityAnalysis class instance), each in our demographic data: Hispanic or Latino, White, Black or African American, and Asian. (The other categories all have relatively small numbers in Oakland.) Answer each of the following questions using these four racial categories (i.e., set racial_categories = ["Hispanic or Latino", "White", "Black or African American", "Asian"]).

Before we dive into the impact of PredPol, let's start by looking at underlying demographics and drug use.

1. [2 pts. Autograder] What is the proportion of each of the four racial categories in Oakland, i.e. what is P(A)? (This is short-hand notation for asking for a tuple of four numbers, P(A=a) for each of the four racial categories a.)

2. [2 pts. Autograder] What is the probability that a person in each of the racial categories uses illicit drugs, i.e. what is P(Y=1|A)?

Now let's see how these numbers compare to the arrests made.

3. a. [2 pts. Autograder] What is the expected number of times a person of each racial category was arrested in Oakland?

3. b. [2 pts.] What is the total expected number of times a person of each racial category was arrested as a percentage of all arrests? Which racial group(s) were arrested in an outsized proportion to their overall proportion in the population?

Now let's move to analyzing the outputs of the PredPol algorithm. To measure the potential for PredPol to be discriminatory, we will start with group fairness, as discussed in class, specifically demographic parity (aka independence). There are two differences from the definition of demographic parity we used in class. The first is that the task is not binary classification: the algorithm doesn't make a decision about a person only once, but rather once every day -- does a person face heightened police presence at timestamp t, or not (we assume that a person will always be subject to a heightened police presence in a given tract if they live there). The other difference is that the sensitive attribute A is not binary but rather quaternary, with four distinct values. We introduced H as our binary indicator variable to fix the first difference and we will define how to measure distance from demographic parity below.

4. [4 pts. Autograder] Run PredPol on each day in the test set and compute the set of twenty tracts to send a heightened police presence to. What is P(H=1|A)?

Because A is not binary, we will instead measure distance from demographic parity as the difference between the largest probability, P(H=1|A=a_max), and the smallest probability, P(H=1|A=a_min).

5. [2 pts.] How far away from demographic parity is PredPol? We can also compare the rate of heightened police presence to use rates for each racial group. Which groups faced an outsized police presence compared to their rates of illicit drug use? Report P(H=1|A=a) and P(Y=1|A=a) for each a.

However, running PredPol only on the existing data doesn't take into account that by assigning more police officers to a given tract, they are more likely to make more arrests than would have otherwise occurred by not using PredPol. But PredPol uses those same arrests: the more recent arrests in a given tract, the more crime PredPol thinks is going to be there. So could PredPol be creating a feedback loop where its initial choices are reinforced, leading to initial bias or discrimination getting reinforced? This would be even worse than PredPol merely repeating the initial bias!

6. [2 pts. Autograder] Repeat the same analysis, i.e. compute P(H=1|A), except use as test-time input to PredPol the arrests that would have happened if police officers were assigned according to PredPol. Because they were not the arrests that actually happened, but arrests that would have happened had the police acted according to PredPol, we call this a counterfactual. In order to compute the desired probabilities, set counterfactuals=True in ProbabilityAnalysis.compute_P_H_eq_1_given_A.

Here's another way of understanding the difference between the original dataset and the counterfactual dataset. Our concern is that on the counterfactual dataset, PredPol gets more and more confident of its choice because of a feedback loop between sending police to a location and PredPol's confidence that there is crime there. We now want to compare the probabilities of expiriencing heightened police activity between the counterfactual and non-counterfactual datasets. For each dataset, let's split the tracts into two: the top ten tracts chosen by PredPol for that dataset, and every other tract.

7. a. [2 pts. Autograder] Run the same analysis again, but this time, for each of the two different approaches (i.e. counterfactuals=True/False), keep track of the following for each time step in the testing data:

(sum over r in top ten tracts of P(H_t|R=r)) / (sum over r not in top ten tracts of P(H_t|R=r))

This is a sequence of odds ratios, representing how much more confident PredPol was on the top ten tracts for that dataset than all other tracts at each time step. Update ProbabilityAnalysis.compute_P_H_eq_1_given_A to track this odds ratio at each time step and conditionally return the odds ratio instead when track_odds_ratio=True.

7. b. [1 pt.] Plot these two sequences of odds ratios using matplotlib, and include it in the written write-up. The x-axis should be each timestamp in the testing data and the y-axis should be the odds ratio value. What does this plot say about how counterfactuals affected how confident PredPol got over time?

Part 2 - Reflection Questions

Answer the following short-response questions. Your response should only be as long as necessary to answer the questions, but do make sure to briefly state why you are right for those questions that need it.

8. a. [2 pts.] What is PredPol trying to predict, and what does it actually predict? That is, what is the construct PredPol is trying to measure, and what is its target variable? Hint: the target is not necessarily the predicted output of PredPol!

8. b. [2 pts.] Suppose PredPol achieves very low error with respect to its target variable. How can it still fail to accurately predict the construct it's trying to predict, despite having low error rates? Use evidence from your analysis in questions 1-4 to support your argument.

9. [3 pts.] How does PredPol constitute a feedback loop as defined in lecture? Using terminology from class, explain why this feedback mechanism can lead to unfair outcomes in predictive policing.

10. a. [2 pts.] In analysis question 5, we measured how far from demographic parity PredPol was. What are we considering a disadvantage in this analysis? What are we considering moral desert?

10. b. [2 pts.] Name and justify one other way we could define moral desert in this context.

11. [2 pt.] We had assumed that whenever an arrest is made in a given tract, a uniformly random person is arrested from that tract. How might this assumption be tested? If it's wrong, what consequences for our analysis might this incorrect assumption cause? Be specific in what values in our analysis would change.

12. [2 pts.] Give and justify an example of harm (in terms of lack of fairness) that an algorithm like PredPol could exhibit that is not captured by statistical fairness definitions or feedback loops.

References

[1] Walter L. Perry, Brian McInnis, Carter C. Price, Susan C. Smith, and John S. Hollywood. Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations. National Institute of Justice, 2013. https://nij.ojp.gov/library/publications/predictive-policing-role-crime-forecasting-law-enforcement-operations

[2] Karen Hao. Police across the US are training crime-predicting AIs on falsified data. MIT Technology Review, 2019. https://www.technologyreview.com/2019/02/13/137444/predictive-policing-algorithms-ai-crime-dirty-data/

[3] ResourceRouter. Website accessed March 2024. https://www.soundthinking.com/law-enforcement/resource-deployment-resourcerouter/

[4] G. O. Mohler, M. B. Short, Sean Malinowski, Mark Johnson, G. E. Tita, Andrea L. Bertozzi, and P. J. Brantingham. Randomized Controlled Field Trials of Predictive Policing. Journal of the American Statistical Association, 2015. https://doi.org/10.1080/01621459.2015.1077710

[5] Rashida Richardson, Jason Schultz, and Kate Crawford. Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice. 94 N.Y.U. L. Rev. Online, 2019. https://www.nyulawreview.org/wp-content/uploads/2019/04/NYULawReview-94-Richardson_etal-FIN.pdf

[6] Kristian Lum and William Isaac. To predict and serve? Significance, 2016. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1740-9713.2016.00960.x

EECS 298: Social Consequences of Computing

Homework 3: To Predict and Serve

Submission

Introduction to Predictive Policing

Datasets

Arrest Data

Demographic Data

Drug Use Data

Part 1 - HW3.py

DataWrapper

ProbabilityAnalysis

Analysis Questions

Part 2 - Reflection Questions

References

`DataWrapper`

`ProbabilityAnalysis`