|Some of our "R" code with a graph dislplaying the "eveness" of our data||.|
As the summer drew to a close, we realized just how little time we had to put together a presentation that would include some preliminary analyses. Thankfully, we had the good fortune to be working with Matt Yergey, who is very capable working with statistics and R, which is the programming language we eventually used to process the data.
By the beginning of the second week, we had to acknowledge that we were not going to be able to finish entering the data in the binder Katlyn had originally started working on if we wanted to do any sort of write up of the data we already had entered. So we left the comforting zone of entering fish length and embarked on a more treacherous path of quality control, understanding statistical tests and communicating science.
This was a very important part of our internship, as so much of our time had been centered on our data. As we had so much data, with so many different variables such as fish count, fish length, location of collection, time and temp of beam trawls, it was a little overwhelming trying to decide which variables we wanted to analyze, and in which way. We read various scientific papers that had similar data sets to see which kind of analyses were usually done on fish populations to determine and compare community structure and we thought about how we could perform similar tests on part of our data set.
At first, we checked for outliers in the dataset, such as fish that were way larger than others, which may have been entered by mistake. We also looked for stations that had latitudes/longitudes that were not on the Oregon coast, or were further out from the coastline than other stations, and then went back and verified that the data was entered correctly. There is a map feature associated with the online database which will plot the latitudes and longitudes entered, which proved very useful. We did discover multiple stations in China (a result of the longitudes being entered without a negative symbol!).
Using R, we split the historic data from the current data and then sorted our data even further, keeping only those fish entries that had associated geographic coordinates and depth information. We then created "bins" of stations with similar depths and latitudes and found the diversity of these bins using a pre-written command in the package "vegan" that we were using with R.
We sorted the data various ways and wrote instructions to display the information in tables, one of which you can see above. The eveness was found using our diversity index and reflected how close two bins of information were to each other in terms of diversity.
Our ultimate statsical test that we ran was an MRPP, which basically found and quantified differences in community structure within and between bins. By the end of the internship, literally a few days before our presentations, we were left with the responsibility of deciding which results to display and the best way to do that.