Lab 2 - An analytical detective

Student

Fábio Santos Lobão

Introduction

Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI). Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward online. Chicago is the third most populous city in the United States, with a population of over 2.7 million people. The city of Chicago is shown in the map below, with the state of Illinois highlighted in red.

There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.

Please download the file mvtWeek1.csv for this problem (do not open this file in any spreadsheet software before completing this problem because it might change the format of the Date field). Here is a list of descriptions of the variables:

ID: a unique identifier for each observation
Date: the date the crime occurred
LocationDescription: the location where the crime occurred
Arrest: whether or not an arrest was made for the crime (TRUE if an arrest was made, and FALSE if an arrest was not made)
Domestic: whether or not the crime was a domestic crime, meaning that it was committed against a family member (TRUE if it was domestic, and FALSE if it was not domestic)
Beat: the area, or “beat” in which the crime occurred. This is the smallest regional division defined by the Chicago police department.
District: the police district in which the crime occurred. Each district is composed of many beats, and are defined by the Chicago Police Department.
CommunityArea: the community area in which the crime occurred. Since the 1920s, Chicago has been divided into what are called “community areas”, of which there are now 77. The community areas were devised in an attempt to create socially homogeneous regions.
Year: the year in which the crime occurred.
Latitude: the latitude of the location at which the crime occurred.
Longitude: the longitude of the location at which the crime occurred.

Problem 1 - Loading the Data

Read the dataset mvtWeek1.csv into R, using the read.csv function, and call the data frame “mvt”. Remember to navigate to the directory on your computer containing the file mvtWeek1.csv first. It may take a few minutes to read in the data, since it is pretty large. Then, use the str and summary functions to answer the following questions.

mvt <- read.csv("./mvtWeek1.csv")
summary(mvt)

##        ID                      Date       
##  Min.   :1310022   5/16/08 0:00  :    11  
##  1st Qu.:2832144   10/17/01 22:00:    10  
##  Median :4762956   4/13/04 21:00 :    10  
##  Mean   :4968629   9/17/05 22:00 :    10  
##  3rd Qu.:7201878   10/12/01 22:00:     9  
##  Max.   :9181151   10/13/01 22:00:     9  
##                    (Other)       :191582  
##                      LocationDescription   Arrest         Domestic      
##  STREET                        :156564   Mode :logical   Mode :logical  
##  PARKING LOT/GARAGE(NON.RESID.): 14852   FALSE:176105    FALSE:191226   
##  OTHER                         :  4573   TRUE :15536     TRUE :415      
##  ALLEY                         :  2308   NA's :0         NA's :0        
##  GAS STATION                   :  2111                                  
##  DRIVEWAY - RESIDENTIAL        :  1675                                  
##  (Other)                       :  9558                                  
##       Beat         District     CommunityArea        Year     
##  Min.   : 111   Min.   : 1.00   Min.   : 0      Min.   :2001  
##  1st Qu.: 722   1st Qu.: 6.00   1st Qu.:22      1st Qu.:2003  
##  Median :1121   Median :10.00   Median :32      Median :2006  
##  Mean   :1259   Mean   :11.82   Mean   :38      Mean   :2006  
##  3rd Qu.:1733   3rd Qu.:17.00   3rd Qu.:60      3rd Qu.:2009  
##  Max.   :2535   Max.   :31.00   Max.   :77      Max.   :2012  
##                 NA's   :43056   NA's   :24616                 
##     Latitude       Longitude     
##  Min.   :41.64   Min.   :-87.93  
##  1st Qu.:41.77   1st Qu.:-87.72  
##  Median :41.85   Median :-87.68  
##  Mean   :41.84   Mean   :-87.68  
##  3rd Qu.:41.92   3rd Qu.:-87.64  
##  Max.   :42.02   Max.   :-87.52  
##  NA's   :2276    NA's   :2276

1.1 - How many rows

How many rows of data (observations) are in this dataset?

answer = nrow(mvt)
cat(sprintf("Number of rows = %d\n", answer))

## Number of rows = 191641

1.2 - How many variables

How many variables are in this dataset?

answer = ncol(mvt)
cat(sprintf("Number of variables = %d\n", answer))

## Number of variables = 11

1.3 - Maximum value

Using the “max” function, what is the maximum value of the variable “ID”?

answer = max(mvt$ID)
cat(sprintf("Maximum value of \"ID\" = %d\n", answer))

## Maximum value of "ID" = 9181151

1.4 - Minimum value

What is the minimum value of the variable “Beat”?

answer = min(mvt$Beat)
cat(sprintf("Minimum value of \"Beat\" = %d\n", answer))

## Minimum value of "Beat" = 111

1.5 - How many arrest

How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

answer = nrow(mvt[mvt$Arrest == TRUE,])
cat(sprintf("Number of arrest = %d\n", answer))

## Number of arrest = 15536

1.6 - How many observations on ALLEY

How many observations have a LocationDescription value of ALLEY?

answer = nrow(mvt[mvt$LocationDescription == 'ALLEY',])
cat(sprintf("Number of LocationDescription with \"ALLEY\" = %d\n", answer))

## Number of LocationDescription with "ALLEY" = 2308

Problem 2 - Understanding Dates in R

In many datasets, like this one, you have a date field. Unfortunately, R does not automatically recognize entries that look like dates. We need to use a function in R to extract the date and time. Take a look at the first entry of Date (remember to use square brackets when looking at a certain entry of a variable).

2.1 What is the date format

In what format are the entries in the variable Date?

# display the labels of a few elements at spreed across the dataset
cat("Data is of class: ",class(mvt$Date),"\n\n",
    "Element      1:",as.character(mvt[1,"Date"]),"\n",
    "Element      2:",as.character(mvt[2,"Date"]),"\n",
    "Element      5:",as.character(mvt[5,"Date"]),"\n",
    "Element    100:",as.character(mvt[100,"Date"]),"\n",
    "Element   1000: ",as.character(mvt[1000,"Date"]),"\n",
    "Element 10 000: ",as.character(mvt[10000,"Date"]),"\n",
    "Element 20 000:  ",as.character(mvt[20000,"Date"]),"\n",
    "Element 30 000:",as.character(mvt[30000,"Date"]),"\n")

## Data is of class:  factor 
## 
##  Element      1: 12/31/12 23:15 
##  Element      2: 12/31/12 22:00 
##  Element      5: 12/31/12 21:30 
##  Element    100: 12/29/12 14:00 
##  Element   1000:  12/6/12 19:30 
##  Element 10 000:  4/17/12 12:00 
##  Element 20 000:   8/4/11 22:30 
##  Element 30 000: 12/26/10 20:00

Answer: Evaluating a sample of the data, one may infer that the information is presented in descending chronological order, as such, the variable date must be in the format “%m/%d/%y %H:%M”.

2.2 - Median date

Now, let’s convert these characters into a Date object in R. In your R console, type

# set locality to present date information in english
answer = Sys.setlocale("LC_TIME","English_United States.1252")

DateConvert = as.Date(strptime(mvt$Date, "%m/%d/%y %H:%M"))

This converts the variable “Date” into a Date object in R. Take a look at the variable DateConvert using the summary function.

What is the month and year of the median date in our dataset?

answer = median(DateConvert)
cat(sprintf("Median date on the dataset = %s\n", format(answer, format="%B %Y")))

## Median date on the dataset = May 2006

2.3 - Month with fewest thefts

Now, let’s extract the month and the day of the week, and add these variables to our data frame mvt. We can do this with two simple functions. Type the following commands in R:

mvt$Month = months(DateConvert)

mvt$Weekday = weekdays(DateConvert)

mvt$Date = DateConvert

# delete from worspace variables that are unnecesary for later processing 
rm(DateConvert)

This creates two new variables in our data frame, Month and Weekday, and sets them equal to the month and weekday values that we can extract from the Date object. Lastly, replace the old Date variable with DateConvert by typing:

Using the table command, answer the following questions.

In which month did the fewest motor vehicle thefts occur?

MonthCount = table(mvt$Month)

answer = names(MonthCount[MonthCount==min(MonthCount)])

cat(sprintf("Month with fewest theft = %s\n", answer))

## Month with fewest theft = February

# delete from worspace variables that are unnecesary for later processing 
rm(MonthCount)

2.4 - Weekday with fewest thefts

On which weekday did the most motor vehicle thefts occur?

WeekdayCount = table(mvt$Weekday)

answer = names(WeekdayCount[WeekdayCount==max(WeekdayCount)])

cat(sprintf("Weekday with fewest theft = %s\n", answer))

## Weekday with fewest theft = Friday

# delete from worspace variables that are unnecesary for later processing 
rm(WeekdayCount)

2.5 - Month with most thefts and arrest

Each observation in the dataset represents a motor vehicle theft, and the Arrest variable indicates whether an arrest was later made for this theft. Which month has the largest number of motor vehicle thefts for which an arrest was made?

DFArrestTrue = mvt[mvt$Arrest==TRUE,]

DFArrestTrueCount = table(DFArrestTrue$Month)

answer = names(DFArrestTrueCount[DFArrestTrueCount==max(DFArrestTrueCount)])

cat(sprintf("Month with more theft and arrest = %s\n", answer))

## Month with more theft and arrest = January

# delete from worspace variables that are unnecesary for later processing 
rm(DFArrestTrue,DFArrestTrueCount)

Problem 3 - Visualizing Crime Trends

Now, let’s make some plots to help us better understand how crime has changed over time in Chicago. Throughout this problem, and in general, you can save your plot to a file.

First, let’s make a histogram of the variable Date. We’ll add an extra argument, to specify the number of bars we want in our histogram. In your R console, type

hist(mvt$Date, breaks=100)

Looking at the histogram, answer the following questions.

3.1 Trends based on the histogram

3.1.1 - Trend between 2002 - 2011

In general, does it look like crime increases or decreases from 2002 - 2012?

# basic alternative
hist(mvt$Date[mvt$Date>as.Date("2002-01-01") & mvt$Date<as.Date("2012-12-31")], breaks=119)

Answer: Decrease

3.1.2 - Trend between 2005 - 2008

In general, does it look like crime increases or decreases from 2005 - 2008?

hist(mvt$Date[mvt$Date>as.Date("2005-01-01") & mvt$Date<as.Date("2008-12-31")], breaks=71)

Answer: Decrease

3.1.3 - Trend between 2009 - 2011

In general, does it look like crime increases or decreases from 2009 - 2011?

hist(mvt$Date[mvt$Date>as.Date("2009-01-01") & mvt$Date<as.Date("2011-12-31")], breaks=71)

Answer: Increase

3.2 - Which period had more arrests

Now, let’s see how arrests have changed over time. Create a boxplot of the variable “Date”, sorted by the variable “Arrest”. In a boxplot, the bold horizontal line is the median value of the data, the box shows the range of values between the first quartile and third quartile, and the whiskers (the dotted lines extending outside the box) show the minimum and maximum values, excluding any outliers (which are plotted as circles). Outliers are defined by first computing the difference between the first and third quartile values, or the height of the box. This number is called the Inter-Quartile Range (IQR). Any point that is greater than the third quartile plus the IQR or less than the first quartile minus the IQR is considered an outlier.

Does it look like there were more crimes for which arrests were made in the first half of the time period or the second half of the time period? (Note that the time period is from 2001 to 2012, so the middle of the time period is the beginning of 2007.)

First half or Second half?

boxplot(mvt$Date~mvt$Arrest,xlab="Arrests",ylab="Year")

Answer: There were more arrests in the first half, since the box plot for true has its median before 2006 and approximately the third quartile is before 2008, indicating that the arrests were concentrated before that date.

3.3 - In what proportion of thefts was an arrest made

Let’s investigate this further. Use the table function for the next few questions.

For what proportion of motor vehicle thefts in 2001 was an arrest made?

Note: in this question and many others in the course, we are asking for an answer as a proportion. Therefore, your answer should take a value between 0 and 1.

answer = table(mvt$Arrest[mvt$Year==2001])["TRUE"]/table(mvt$Arrest[mvt$Year==2001])["FALSE"]

cat(sprintf("Proportion of arrest in 2001 = %f \n", answer))

## Proportion of arrest in 2001 = 0.116218

3.4 - Proportion thefts with arrest in 2007

For what proportion of motor vehicle thefts in 2007 was an arrest made?

answer = table(mvt$Arrest[mvt$Year==2007])["TRUE"]/table(mvt$Arrest[mvt$Year==2007])["FALSE"]

cat(sprintf("Proportion of arrest in 2007 = %f \n", answer))

## Proportion of arrest in 2007 = 0.092746

3.5 - Proportion thefts with arrest in 2012

For what proportion of motor vehicle thefts in 2012 was an arrest made?

answer = table(mvt$Arrest[mvt$Year==2012])["TRUE"]/table(mvt$Arrest[mvt$Year==2012])["FALSE"]

cat(sprintf("Proportion of arrest in 2012 = %f \n", answer))

## Proportion of arrest in 2012 = 0.040614

Problem 4 - Popular Locations

4.1 - Where should they focus their efforts

Analysing this data could be useful to the Chicago Police Department when deciding where to allocate resources. If they want to increase the number of arrests that are made for motor vehicle thefts, where should they focus their efforts?

We want to find the top five locations where motor vehicle thefts occur. If you create a table of the LocationDescription variable, it is unfortunately very hard to read since there are 78 different locations in the data set. By using the sort function, we can view this same table, but sorted by the number of observations in each category. In your R console, type:

sort(table(mvt$LocationDescription))

## 
##     AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
##                                               1 
##                  AIRPORT EXTERIOR - SECURE AREA 
##                                               1 
##                                 ANIMAL HOSPITAL 
##                                               1 
##                                 APPLIANCE STORE 
##                                               1 
##                                       CTA TRAIN 
##                                               1 
##                         JAIL / LOCK-UP FACILITY 
##                                               1 
##                                       NEWSSTAND 
##                                               1 
##                                          BRIDGE 
##                                               2 
##               COLLEGE/UNIVERSITY RESIDENCE HALL 
##                                               2 
##                               CURRENCY EXCHANGE 
##                                               2 
##                                   BOWLING ALLEY 
##                                               3 
##                                  CLEANING STORE 
##                                               3 
##                           MEDICAL/DENTAL OFFICE 
##                                               3 
##                              ABANDONED BUILDING 
##                                               4 
## AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA 
##                                               4 
##                                      BARBERSHOP 
##                                               4 
##                  LAKEFRONT/WATERFRONT/RIVERBANK 
##                                               4 
##                                         LIBRARY 
##                                               4 
##                                SAVINGS AND LOAN 
##                                               4 
##  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA 
##                                               5 
##                                   CHA APARTMENT 
##                                               5 
##                                 DAY CARE CENTER 
##                                               5 
##                                    FIRE STATION 
##                                               5 
##                                 FOREST PRESERVE 
##                                               6 
##                                            BANK 
##                                               7 
##                               CONVENIENCE STORE 
##                                               7 
##                                      DRUG STORE 
##                                               8 
##                 OTHER COMMERCIAL TRANSPORTATION 
##                                               8 
##                                   ATHLETIC CLUB 
##                                               9 
##                   AIRPORT VENDING ESTABLISHMENT 
##                                              10 
##                             AIRPORT PARKING LOT 
##                                              11 
##                       SCHOOL, PRIVATE, BUILDING 
##                                              14 
##                             TAVERN/LIQUOR STORE 
##                                              14 
##                  FACTORY/MANUFACTURING BUILDING 
##                                              16 
##                                   BAR OR TAVERN 
##                                              17 
##                                       WAREHOUSE 
##                                              17 
##                             MOVIE HOUSE/THEATER 
##                                              18 
##                         RESIDENCE PORCH/HALLWAY 
##                                              18 
##                    NURSING HOME/RETIREMENT HOME 
##                                              21 
##                                         TAXICAB 
##                                              21 
##                                DEPARTMENT STORE 
##                                              22 
##                              HIGHWAY/EXPRESSWAY 
##                                              22 
##                        SCHOOL, PRIVATE, GROUNDS 
##                                              23 
##                              VEHICLE-COMMERCIAL 
##                                              23 
##              AIRPORT EXTERIOR - NON-SECURE AREA 
##                                              24 
##               OTHER RAILROAD PROP / TRAIN DEPOT 
##                                              28 
##                              SMALL RETAIL STORE 
##                                              33 
##                               CONSTRUCTION SITE 
##                                              35 
##                                        CAR WASH 
##                                              44 
##                      COLLEGE/UNIVERSITY GROUNDS 
##                                              47 
##                    GOVERNMENT BUILDING/PROPERTY 
##                                              48 
##                                      RESTAURANT 
##                                              49 
##               CHURCH/SYNAGOGUE/PLACE OF WORSHIP 
##                                              56 
##                              GROCERY FOOD STORE 
##                                              80 
##                       HOSPITAL BUILDING/GROUNDS 
##                                             101 
##                        SCHOOL, PUBLIC, BUILDING 
##                                             114 
##                                     HOTEL/MOTEL 
##                                             124 
##                    COMMERCIAL / BUSINESS OFFICE 
##                                             126 
##                     CTA GARAGE / OTHER PROPERTY 
##                                             148 
##                            SPORTS ARENA/STADIUM 
##                                             166 
##                                       APARTMENT 
##                                             184 
##                         SCHOOL, PUBLIC, GROUNDS 
##                                             206 
##                                   PARK PROPERTY 
##                                             255 
##                 POLICE FACILITY/VEH PARKING LOT 
##                                             266 
##                                AIRPORT/AIRCRAFT 
##                                             363 
##                         CHA PARKING LOT/GROUNDS 
##                                             405 
##                                        SIDEWALK 
##                                             462 
##                          VEHICLE NON-COMMERCIAL 
##                                             817 
##                                 VACANT LOT/LAND 
##                                             985 
##                                RESIDENCE-GARAGE 
##                                            1176 
##                                       RESIDENCE 
##                                            1302 
##                   RESIDENTIAL YARD (FRONT/BACK) 
##                                            1536 
##                          DRIVEWAY - RESIDENTIAL 
##                                            1675 
##                                     GAS STATION 
##                                            2111 
##                                           ALLEY 
##                                            2308 
##                                           OTHER 
##                                            4573 
##                  PARKING LOT/GARAGE(NON.RESID.) 
##                                           14852 
##                                          STREET 
##                                          156564

Which locations are the top five locations for motor vehicle thefts, excluding the “Other” category? You should select 5 of the following options.

DFLocationNotOther = mvt[mvt$LocationDescription!="OTHER",]
answer = names(head(sort(table(DFLocationNotOther$LocationDescription), decreasing = TRUE),5))

for (IndexNumber in 1:5) {
  cat(sprintf("Location #%d: %s\n",IndexNumber,answer[IndexNumber]))
}

## Location #1: STREET
## Location #2: PARKING LOT/GARAGE(NON.RESID.)
## Location #3: ALLEY
## Location #4: GAS STATION
## Location #5: DRIVEWAY - RESIDENTIAL

# delete from worspace variables that are unnecesary for later processing 
rm(DFLocationNotOther)

Problem 4.2 - Creating subsets of the top locations

Create a subset of your data, only taking observations for which the theft happened in one of these five locations, and call this new data set “Top5”. To do this, you can use the | symbol. In lecture, we used the & symbol to use two criteria to make a subset of the data. To only take observations that have a certain value in one variable or the other, the | character can be used in place of the & symbol. This is also called a logical “or” operation.

Alternately, you could create five different subsets, and then merge them together into one data frame using rbind.

How many observations are in Top5?

#repeat the previous in order to retreave the name of the top 5 locations
DFLocationNotOther = mvt[mvt$LocationDescription!="OTHER",]
DFLocationNotOtherCount = names(head(sort(table(DFLocationNotOther$LocationDescription), decreasing = TRUE),5))

#create a new data frame with the top five locations
DFTop5 = mvt[(mvt$LocationDescription==DFLocationNotOtherCount[1]),]
for (IndexNumber in 2:5) {
  DFTop5 = rbind(DFTop5,mvt[(mvt$LocationDescription==DFLocationNotOtherCount[IndexNumber]),])
} 

# count the rows
answer = nrow(DFTop5)
  
cat(sprintf("The Top5 crime locations includes %d observations\n",answer))

## The Top5 crime locations includes 177510 observations

# delete from worspace variables that are unnecesary for later processing 
rm(DFLocationNotOther,DFLocationNotOtherCount,IndexNumber)

R will remember the other categories of the LocationDescription variable from the original dataset, so running table(Top5$LocationDescription) will have a lot of unnecessary output.

table(DFTop5$LocationDescription)

## 
##                              ABANDONED BUILDING 
##                                               0 
## AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA 
##                                               0 
##     AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
##                                               0 
##              AIRPORT EXTERIOR - NON-SECURE AREA 
##                                               0 
##                  AIRPORT EXTERIOR - SECURE AREA 
##                                               0 
##                             AIRPORT PARKING LOT 
##                                               0 
##  AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA 
##                                               0 
##                   AIRPORT VENDING ESTABLISHMENT 
##                                               0 
##                                AIRPORT/AIRCRAFT 
##                                               0 
##                                           ALLEY 
##                                            2308 
##                                 ANIMAL HOSPITAL 
##                                               0 
##                                       APARTMENT 
##                                               0 
##                                 APPLIANCE STORE 
##                                               0 
##                                   ATHLETIC CLUB 
##                                               0 
##                                            BANK 
##                                               0 
##                                   BAR OR TAVERN 
##                                               0 
##                                      BARBERSHOP 
##                                               0 
##                                   BOWLING ALLEY 
##                                               0 
##                                          BRIDGE 
##                                               0 
##                                        CAR WASH 
##                                               0 
##                                   CHA APARTMENT 
##                                               0 
##                         CHA PARKING LOT/GROUNDS 
##                                               0 
##               CHURCH/SYNAGOGUE/PLACE OF WORSHIP 
##                                               0 
##                                  CLEANING STORE 
##                                               0 
##                      COLLEGE/UNIVERSITY GROUNDS 
##                                               0 
##               COLLEGE/UNIVERSITY RESIDENCE HALL 
##                                               0 
##                    COMMERCIAL / BUSINESS OFFICE 
##                                               0 
##                               CONSTRUCTION SITE 
##                                               0 
##                               CONVENIENCE STORE 
##                                               0 
##                     CTA GARAGE / OTHER PROPERTY 
##                                               0 
##                                       CTA TRAIN 
##                                               0 
##                               CURRENCY EXCHANGE 
##                                               0 
##                                 DAY CARE CENTER 
##                                               0 
##                                DEPARTMENT STORE 
##                                               0 
##                          DRIVEWAY - RESIDENTIAL 
##                                            1675 
##                                      DRUG STORE 
##                                               0 
##                  FACTORY/MANUFACTURING BUILDING 
##                                               0 
##                                    FIRE STATION 
##                                               0 
##                                 FOREST PRESERVE 
##                                               0 
##                                     GAS STATION 
##                                            2111 
##                    GOVERNMENT BUILDING/PROPERTY 
##                                               0 
##                              GROCERY FOOD STORE 
##                                               0 
##                              HIGHWAY/EXPRESSWAY 
##                                               0 
##                       HOSPITAL BUILDING/GROUNDS 
##                                               0 
##                                     HOTEL/MOTEL 
##                                               0 
##                         JAIL / LOCK-UP FACILITY 
##                                               0 
##                  LAKEFRONT/WATERFRONT/RIVERBANK 
##                                               0 
##                                         LIBRARY 
##                                               0 
##                           MEDICAL/DENTAL OFFICE 
##                                               0 
##                             MOVIE HOUSE/THEATER 
##                                               0 
##                                       NEWSSTAND 
##                                               0 
##                    NURSING HOME/RETIREMENT HOME 
##                                               0 
##                                           OTHER 
##                                               0 
##                 OTHER COMMERCIAL TRANSPORTATION 
##                                               0 
##               OTHER RAILROAD PROP / TRAIN DEPOT 
##                                               0 
##                                   PARK PROPERTY 
##                                               0 
##                  PARKING LOT/GARAGE(NON.RESID.) 
##                                           14852 
##                 POLICE FACILITY/VEH PARKING LOT 
##                                               0 
##                                       RESIDENCE 
##                                               0 
##                                RESIDENCE-GARAGE 
##                                               0 
##                         RESIDENCE PORCH/HALLWAY 
##                                               0 
##                   RESIDENTIAL YARD (FRONT/BACK) 
##                                               0 
##                                      RESTAURANT 
##                                               0 
##                                SAVINGS AND LOAN 
##                                               0 
##                       SCHOOL, PRIVATE, BUILDING 
##                                               0 
##                        SCHOOL, PRIVATE, GROUNDS 
##                                               0 
##                        SCHOOL, PUBLIC, BUILDING 
##                                               0 
##                         SCHOOL, PUBLIC, GROUNDS 
##                                               0 
##                                        SIDEWALK 
##                                               0 
##                              SMALL RETAIL STORE 
##                                               0 
##                            SPORTS ARENA/STADIUM 
##                                               0 
##                                          STREET 
##                                          156564 
##                             TAVERN/LIQUOR STORE 
##                                               0 
##                                         TAXICAB 
##                                               0 
##                                 VACANT LOT/LAND 
##                                               0 
##                              VEHICLE-COMMERCIAL 
##                                               0 
##                          VEHICLE NON-COMMERCIAL 
##                                               0 
##                                       WAREHOUSE 
##                                               0

To make our tables a bit nicer to read, we can refresh this factor variable. In your R console, type:

DFTop5$LocationDescription = factor(DFTop5$LocationDescription)

If you run the str or table function on Top5 now, you should see that LocationDescription now only has 5 values, as we expect.

table(DFTop5$LocationDescription)

## 
##                          ALLEY         DRIVEWAY - RESIDENTIAL 
##                           2308                           1675 
##                    GAS STATION PARKING LOT/GARAGE(NON.RESID.) 
##                           2111                          14852 
##                         STREET 
##                         156564

Use the Top5 data frame to answer the remaining questions.

Problem 4.3 - Which Location has much higher arrest rate

One of the locations has a much higher arrest rate than the other locations. Which is it? Please enter the text in exactly the same way as how it looks in the answer options for Problem 4.1.

LocationArrestCount = with(DFTop5,table(Arrest,LocationDescription))
answer = names(sort(LocationArrestCount["TRUE",]/(LocationArrestCount["TRUE",]+LocationArrestCount["FALSE",]), decreasing = TRUE)[1])

cat(sprintf("Location with highest arrest rate is: \"%s\"\n",answer))

## Location with highest arrest rate is: "GAS STATION"

# delete from worspace variables that are unnecesary for later processing 
rm(LocationArrestCount)

Problem 4.4 - Day with more thefts on gas stations

On which day of the week do the most motor vehicle thefts at gas stations happen?

DFGasStation = DFTop5[DFTop5$LocationDescription!="GAS STATION",]

answer = names(sort(table(DFGasStation$Weekday))[1])

cat(sprintf("Day of the week with more thefts on gas stations: \"%s\"\n",answer))

## Day of the week with more thefts on gas stations: "Sunday"

# delete from worspace variables that are unnecesary for later processing 
rm(DFGasStation)

Problem 4.5 - Day with fewer theft on driveways

On which day of the week do the fewest motor vehicle thefts in residential driveways happen?

DFResidential = DFTop5[DFTop5$LocationDescription!="DRIVEWAY - RESIDENTIAL",]

answer = names(sort(table(DFResidential$Weekday), decreasing = TRUE)[1])

cat(sprintf("Day of the week with fewer theft on driveways: \"%s\"\n",answer))

## Day of the week with fewer theft on driveways: "Friday"

# delete from worspace variables that are unnecesary for later processing 
rm(DFResidential,DFTop5,answer,mvt)

BAN - Business Analytics