This lesson is still being designed and assembled (Pre-Alpha version)

Whiskerbook Data Manipulation

Overview

Teaching: 60 min
Exercises: 30 min
Questions
  • How to manipulate data for descriptive information

Objectives
  • Perform data wrangling and manipulation

  • Sort only high quality data

  • Divide data into left only and right only

Initially, we will load the dplyr library and the edf_tdf.csv that we had produced in an earlier lesson.

edf_tdf<-read.csv("edf_tdf.csv")

We had previously manually labeled our encounters with information about Juvenilles.

Lets check this information by looking at the dataframe with ONLY Juvenilles.

#subset the dataframe for records containing juvenilles. These are the 
edf_tdf_J<-edf_tdf[which(edf_tdf$Juvenilles=="J"),]

Now, we want to subset only the unique juvenilles from our dataframe based on the individual ID and the date_time signature. We do not want to include more than one encounter from each day. For our purposes, we only need one per day.

We can check the unique individual ID of the juvenilles to find out how many we had encountered during our study.

#subset the dataframe to identify unique individuals
edf_tdf_J<-unique(edf_tdf_J[,c("Marked.Individual","date_Time")])

#pull the indiviudal ID out
edf_tdf_J_ID<-unique(as.character(edf_tdf_J$Marked.Individual))

Lets remove the Juvenilles from our dataframe, since we won’t need them for our analysis or our descriptions.

In our analysis, we will not include juvenilles for estimates of density or abundance.

#Remove juvenilles from the dataframe
edf_tdf<-edf_tdf[-which(edf_tdf$Juvenilles=="J"),]

Subset the data by distinct fields, since the encounter data can include several encounters from one day. . For this, we will use dplyr to subset based on distinct ID, date_time, Side, and location ID. That way, we have a complete case of records and nothing extra.

#get unique records by several fields using distinct
edf_tdf_d<-edf_tdf %>% distinct(Marked.Individual, date_Time, Side, Location.ID, .keep_all = TRUE)

Next we can create some dataframes of high quality and low quality encounters.

edf_tdf_high<-edf_tdf_d[which(edf_tdf_d$Quality=="H"),]
edf_tdf_low<-edf_tdf_d[which(edf_tdf_d$Quality %in% c("L","M")),]

Now we will do some basic descriptions of the left and right hand sides and determine which individuals have left sides and right sides, or only one or the other.

ID_left_right<-edf_tdf_high%>%
  group_by(Marked.Individual, Side)%>%
  count()

Let’s subset the data that are annotated with either left and right.

ID_left_right_sub<-ID_left_right[which(ID_left_right$Side %in% c("L", "R")),]

Lets see how many individuals have only the left side or right side by counting the records grouped by ID.

ID_left_right_sub_final<-ID_left_right_sub%>%
  group_by(Marked.Individual)%>%
  count()

From this,we can find the records which are left only or right only individuals.

OneSided<-ID_left_right_sub_final[which(ID_left_right_sub_final$n==1),]
TwoSided<-ID_left_right_sub_final[which(ID_left_right_sub_final$n==2),]

Now we can find which individuals have one side or both sides.

ID_oneOnly<-ID_left_right_sub[which(ID_left_right_sub$Marked.Individual %in% OneSided$Marked.Individual),]
ID_twoSided<-ID_left_right_sub[which(ID_left_right_sub$Marked.Individual %in% TwoSided$Marked.Individual),]

From this we can find the individuals that are one sided and right only or left only.

ID_oneOnly_right<-ID_oneOnly[ID_oneOnly$Side=="R",]
ID_oneOnly_left<-ID_oneOnly[ID_oneOnly$Side=="L",]

Now we can subset the dataframe to remove the records that are right only to create the left only dataframe. Similarly, when we subtract out the left only records, then we have a right only dataframe.

edf_left<-edf_tdf_high[-which(edf_tdf_high$Marked.Individual %in% ID_oneOnly_right$Marked.Individual),]
edf_right<-edf_tdf_high[-which(edf_tdf_high$Marked.Individual %in% ID_oneOnly_left$Marked.Individual),]

Challenge: Camera Operability Matrix

Answer the following questions:

  1. How many individuals were back only?
  ID_back_final<-ID_left_right%>%
 group_by(Side)%>%
 count()

From this result, we can see that 9 detections were back, and 3 were tail only.

Key Points

  • Subset left and right only individuals

  • Subset individuals that have encounters that include both left and right sides

  • Subset dataframes that include left only and both sides, and right and both sides