Date: 2021-03-05
Author: Björn Andersson

# Prediction of Most Probable Number (MPN) and contrast with microsphere contamination

# Structure and Files

The input directory (input needed for MPN computations, input2 for plotts of data) is a storage of temporary files. The Results directory stores MPN output data from the for-i-loop in R-script (Results needed for MPN computations, Result2 for normalization to relative abundances and plotting). **MPNresults.txt** is the main input data file for MPN anlaysis, but **BEADresults.txt** is also integrated to show contamination. **Survival.txt** is read and plotted but this is completely independent of the flow of the script and can be omitted.  The headers and structure of this file needs to be retained for the R-script to work without modifications, but data or indexing in individual column can be exchanges (e.g. in Excel and save as tab delimited .txt file). 

**MPN_template.R** is the R-script that process the input files, computes MPN for each individual column of species (which contains binary observations of presence/absence data in specific replicate, coded as 1 for presence and 0 for absence).

# Input file **MPNresults.txt**

The columns are:

Core; refers to a specific core/sample

Plate; Index that separeras all samples (important in R-analysis)

Depth; depth in core which is used for data plotting only.

Dilution; Which dilution step (important for data summary step before MPN)

g sediment rep-1; Specific sediment amount (wet-weight) in each dilution step (this is what is used for MPN computation)

Well; Not used in R-script but useful when counting and making observations.

Replicate; indicates which replication of the dilution series is used. Note that if this number (3 in this example) is not the same for ever sample in the dataset the script will not work properly and need to be re-coded. 

The rest of the columns have observations of presents/absence (1/0) data for individual taxa.

SM; *S. marinoi*, 
TB; *T. baltica*, 
CSSP; *Cheatoseros SPP*, 
DSSP; *Dolichospermum affine/spp*, 
FL; small flagelates, NS= *Nodularia Spumigena*, 
BD; Penate benthic diatoms [at least 10 diff species, mainly Naviculoids and Amphorioids at high concentration], 
DF; Dinoflagellates [>3 species very infrequently observed], 
Others; any other species outside these categories [Almost exclusively *Fragilariopsis spp* at heavy dilution so in  the end it is re-coded as this species. But *Melosira spp*, other chainforming diatoms, coccolid green algae/cyanobacteria or other microalgae where also seen in less diluted samples]) 


# R SCRIPT

The R-script mainly uses the MPN package for analysis.

Martine Ferguson and John Ihrie (2019). MPN: Most Probable Number and Other Microbial
  Enumeration Techniques. R package version 0.3.0. https://CRAN.R-project.org/package=MPN
  
It also a set of packages for plotting and datahandling that can be omitted or worked around if needed.

a First step is to read in and conform the data to a format that the MPN package can use.

In the for-i-loops, MPN is computed individualy (using column **g sediment rep-1** and binary observations in seperat columns) based on indexing in column **Plate**. Graphs are made based on MPN vs **Depth**, separerade by **Core** and Species columns **(SM, TB, CSSP, FL, BD, NS)**. For the script to work, these names needs to be conserved, or changed in the R script as well as input file. 

After the for-i-loops the MPN results ("Index, Species, MPN, 95% low, 95% high") is outputed into results folder (individual .csv files for each Indexed **Plate**).

The files needs to be combined to **MPNresults.txt** (which is actually csv formated) before moving past line 102. I do this in Bash using the command [cat *.csv > MPNresults.csv]. This file is the read into R again, so the script can be re-accessed from this point once this file is made.

In the #Beads#### section the observations of microspheres in sedmint samples are importade and plotted from file "BEADresults.txt". The smaller beads 	
BB_1_75um are omitted from analysis and plots since i had difficulties with false possitives att low concentrations and therefore stoped counting them (after 8 cm depth). Where available, this data is an okay approximation but may be overestimated. The microspheres/bead data is merged with the MPN data for downstream plots, but also plotted alone.

Then there are plots of the ex-situ change in viability of diatoms in #Survival observations####, which is independent of all other analysis and can be skipped.

Finally under #MPN species#### the data is formated and plotted for the phytoplankton with absolute abunddances first. Then another for-i-loop goes through and normalizes the abundances to the peak abundance, to make the beads and MPN data comparable. In the process No observation data is removed, and the data manipulated to show the detection limit and deal with infinity values on logscales. The data is seperated into multiple files in the input2 directory, read in again and manipluted and exported to Result2 directory. Then they need to be merged to one and imported again (use cat *.csv > MPNresults2.txt in the Result 2 directory to generate a file that the current script will read in)

Finally this data is ploted in two graphs.

# The end
#