Project Introduction

This project focuses on ICES WGSAM ToR c, Skill assessment: Establish and apply methods to assess the skill of multispecies models intended for operational advice

Project meeting notes are posted here.

Participants:

Simulating input data from an ecosystem model

We use existing Atlantis ecosystem model output to generate input datasets for a variety of multispecies models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models simulate a wide range of physical and ecological processes, include a full food web, and can be run using different climate forcing, fishing, and other scenarios.

We extract simulated data using the R package atlantisom. The purpose of atlantisom is to use existing Atlantis model output to generate input datasets for a variety of models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models can be run using different climate forcing, fishing, and other scenarios. Users of atlantisom will be able to specify fishery independent and fishery dependent sampling in space and time, as well as species-specific catchability, selectivty, and other observation processes for any Atlantis scenario. Internally consistent multispecies and ecosystem datasets with known observation error characteristics will be the atlantisom outputs, for use in individual model performance testing, comparing performance of alternative models, and performance testing of model ensembles against “true” Atlantis outputs.

Species in the ms-keyrun dataset

Our initial species selection includes 11 single species groups from the Norwegian Barents Sea (NOBA) Atlantis model [1,2]. These groups are fully age structured. All but two of them are fished.

fgs <- load_fgs(here("SkillAssessment/atlantisoutput/NOBA_sacc_38"), "nordic_groups_v04.csv")

lname <- data.frame(Latin = c("*Hippoglossoides platessoides*",
                              "*Reinhardtius hippoglossoides*",
                              "*Scomber scombrus*",
                              "*Melongrammus aeglefinus*",
                              "*Pollachius virens*",
                              "*Sebastes mentella*",
                              "*Micromesistius poutassou*",
                              "*Clupea harengus*",
                              "*Gadus morhua*",
                              "*Boreogadus saida*",
                              "*Mallotus villosus*"),
                    Code = c("LRD", "GRH", "MAC", "HAD", "SAI", "RED", 
                             "BWH", "SSH", "NCO", "PCO", "CAP")
)

sppsubset <- merge(fgs, lname, all.y = TRUE)
spptable <- sppsubset %>% 
  arrange(Index) %>%
  select(Name, Long.Name, Latin)

knitr::kable(spptable, col.names = c("Model name", "Full name", "Latin name"))
Model name Full name Latin name
Long_rough_dab Long rough dab Hippoglossoides platessoides
Green_halibut Greenland halibut Reinhardtius hippoglossoides
Mackerel Mackerel Scomber scombrus
Haddock Haddock Melongrammus aeglefinus
Saithe Saithe Pollachius virens
Redfish Redfish Sebastes mentella
Blue_whiting Blue whiting Micromesistius poutassou
Norwegian_ssh Norwegian spring spawning herring Clupea harengus
North_atl_cod Northeast Atlantic cod Gadus morhua
Polar_cod Polar cod Boreogadus saida
Capelin Capelin Mallotus villosus

Below we generate initial survey outputs that reflect initial group decisions. This first dataset will be near-perfect (no additional observation error or bias) and all modelers will know this. Dataset will include survey biomass and catch indices, survey and fishery length and age compositions, and diet information, and possibly other input parameters (VBGF parameters, etc).

Generating the initial dataset (June 2021)

Configuration files specified once

NOBA2config.R specifies location and names of files needed for atlantisom to initialize:

d.name <- here("SkillAssessment/atlantisoutput","NOBA_sacc_38")
functional.groups.file <- "nordic_groups_v04.csv"
biomass.pools.file <- "nordic_biol_v23.nc"
biol.prm.file <- "nordic_biol_incl_harv_v_011_1skg.prm"
box.file <- "Nordic02.bgm"
initial.conditions.file <- "nordic_biol_v23.nc"
run.prm.file <- "nordic_run_v01.xml"
scenario.name <- "nordic_runresults_01"
bioind.file <- "nordic_runresults_01BiomIndx.txt"
catch.file <- "nordic_runresults_01Catch.txt"
annage <- TRUE
fisheries.file <- "NoBAFisheries.csv"

omdimensions.R standardizes timesteps, etc. (this is part of atlantisom and should not need to be changed by the user):

#survey species inherited from omlist_ss
survspp <- omlist_ss$species_ss
# survey season and other time dimensioning parameters
# generalized timesteps all models
noutsteps <- omlist_ss$runpar$tstop/omlist_ss$runpar$outputstep
timeall <- c(0:noutsteps)
stepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutinc
midptyr <- round(median(seq(0,stepperyr)))

# model areas, subset in surveyconfig
allboxes <- c(0:(omlist_ss$boxpars$nbox - 1))

# fishery output: learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutfinc

# survey selectivity (agecl based)
sp_age <- omlist_ss$funct.group_ss[, c("Name", "NumCohorts", "NumAgeClassSize")]

# should return all age classes fully sampled (Atlantis output is 10 age groups per spp)
n_age_classes <- sp_age$NumCohorts
# changed below for multiple species NOTE survspp alphabetical; NOT in order of fgs!!
# this gives correct names
age_classes <- lapply(n_age_classes, seq)
names(age_classes)<-sp_age$Name

n_annages <- sp_age$NumCohorts * sp_age$NumAgeClassSize
# changed below for multiple species
annages <- lapply(n_annages, seq)
names(annages)<-sp_age$Name

Change these survey and fishery config files

mssurvey_spring.R and mssurvey_fall.R configure the fishery independent surveys (in this census test, surveys sample all model polygons in all years and have efficiency of 1 for all species, with no size selectivity):

# Default survey configuration here has a range of efficiencies and selectivities
# To emulate a range of species in a single multispecies survey
# Also now happens in "spring" and "fall"
# Need to define survey season, area, efficiency, selectivity

# Survey name
survey.name="BTS_spring_allbox_effic1"

#Atlantis model timestep corresponding to the true output--now from census_spec.R
timestep <- stepperyr #5

#Which atlantis timestep does the survey run in?--now from census_spec.R
# with 5 output steps per year, 0 is Jan-Feb-midMar, 1 is midMar-Apr-May, 
# 2 is June-July-midAug, 3 is midAug-Sept-Oct, 4 is Nov-Dec (ish)

# No, timestep 0 is initial condition and should be ignored to align 
# snapshots (biomass, numbers) with
# cumulative outputs (fishery catch, numbers)

# with 5 output steps per (non leap) year:
# 1 is day 73, or 14 March
# 2 is day 146, or 26 May
# 3 is day 219, or 7 August
# 4 is day 292, or 19 October
# 5 is day 365, or 31 December

survey_sample_time <- 1 # spring survey

#The last timestep to sample
total_sample <- noutsteps-1 #495

#Vector of indices of survey times to pull
survey_sample_full <- seq(survey_sample_time,
                          total_sample, by=timestep)

survtime <- survey_sample_full

# survey area
# should return all model areas
survboxes <- allboxes

# survey efficiency (q)
# should return a perfectly efficient survey 
surveffic <- data.frame(species=survspp,
                     efficiency=rep(1.0,length(survspp)))

# survey selectivity (agecl based)
# this is by age class, need to change to use with ANNAGEBIO output
#survselex <- data.frame(species=rep(names(age_classes), each=n_age_classes),
#                     agecl=rep(c(1:n_age_classes),length(survspp)),
#                     selex=rep(1.0,length(survspp)*n_age_classes))

# for annage output uses names(annages) NOT alphabetical survspp
survselex <- data.frame(species=rep(names(annages), n_annages), #  
                        agecl=unlist(sapply(n_annages,seq)),
                        selex=rep(1.0,sum(n_annages)))

survselex.agecl <- survselex


# effective sample size needed for sample_fish
# this effective N is high but not equal to total for numerous groups
surveffN <- data.frame(species=survspp, effN=rep(100000, length(survspp)))

# survey index cv needed for sample_survey_xxx
# cv = 0.1
surv_cv <- data.frame(species=survspp, cv=rep(0.1,length(survspp)))

# length at age cv for input into calc_age2length function
# function designed to take one cv for all species, need to change to pass it a vector
lenage_cv <- 0.1

# max size bin for length estimation, function defaults to 150 cm if not supplied
maxbin <- 200

# diet sampling parameters
alphamult <- 10000000
unidprey <- 0
# Default survey configuration here has a range of efficiencies and selectivities
# To emulate a range of species in a single multispecies survey
# Also now happens in "spring" and "fall"
# Need to define survey season, area, efficiency, selectivity

# Survey name
survey.name="BTS_fall_allbox_effic1"

#Atlantis model timestep corresponding to the true output--now from census_spec.R
timestep <- stepperyr #5

#Which atlantis timestep does the survey run in?--now from census_spec.R
# with 5 output steps per year, 0 is Jan-Feb-midMar, 1 is midMar-Apr-May, 
# 2 is June-July-midAug, 3 is midAug-Sept-Oct, 4 is Nov-Dec (ish)

# No, timestep 0 is initial condition and should be ignored to align 
# snapshots (biomass, numbers) with
# cumulative outputs (fishery catch, numbers)

# with 5 output steps per (non leap) year:
# 1 is day 73, or 14 March
# 2 is day 146, or 26 May
# 3 is day 219, or 7 August
# 4 is day 292, or 19 October
# 5 is day 365, or 31 December

survey_sample_time <- 3 # fall survey

#The last timestep to sample
total_sample <- noutsteps-1 #495

#Vector of indices of survey times to pull
survey_sample_full <- seq(survey_sample_time,
                          total_sample, by=timestep)

survtime <- survey_sample_full

# survey area
# should return all model areas
survboxes <- allboxes

# survey efficiency (q)
# should return a perfectly efficient survey 
surveffic <- data.frame(species=survspp,
                     efficiency=rep(1.0,length(survspp)))

# survey selectivity (agecl based)
# this is by age class, need to change to use with ANNAGEBIO output
#survselex <- data.frame(species=rep(survspp, each=n_age_classes),
#                     agecl=rep(c(1:n_age_classes),length(survspp)),
#                     selex=rep(1.0,length(survspp)*n_age_classes))

# for annage output
survselex <- data.frame(species=rep(names(annages), n_annages), #  
                        agecl=unlist(sapply(n_annages,seq)),
                        selex=rep(1.0,sum(n_annages)))

survselex.agecl <- survselex 


# effective sample size needed for sample_fish
# this effective N is high but not equal to total for numerous groups
surveffN <- data.frame(species=survspp, effN=rep(100000, length(survspp)))

# survey index cv needed for sample_survey_xxx
# cv = 0.1
surv_cv <- data.frame(species=survspp, cv=rep(0.1,length(survspp)))

# length at age cv for input into calc_age2length function
# function designed to take one cv for all species, need to change to pass it a vector
lenage_cv <- 0.1

# max size bin for length estimation, function defaults to 150 cm if not supplied
maxbin <- 200

# diet sampling parameters
alphamult <- 10000000
unidprey <- 0

msfishery.R configures the fishery dependent data:

# Default fishery configuration here is a census
fishery.name="census"

# Inherits species from input omlist_ss
fishspp <- omlist_ss$species_ss 

#Number of years of data to pull
nyears <- omlist_ss$runpar$nyears

#Atlantis initialization period in years
burnin <- 0

# fishery output: learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutfinc


# same time dimensioning parameters as in surveycensus.R
#Vector of indices of catch in numbers to pull (by timestep to sum)
fish_sample_full <- c(0:total_sample)  #total_sample defined in sardinesurvey.R
fish_burnin <- burnin*fstepperyr+1
fish_nyears <- nyears*fstepperyr
fish_times <- fish_sample_full[fish_burnin:(fish_burnin+fish_nyears-1)]
fish_timesteps <- seq(fish_times[fstepperyr], max(fish_times), by=fstepperyr) #last timestep
#fish_years <- unique(floor(fish_times/fstepperyr)+1) # my original
fish_years <- unique(floor(fish_times/fstepperyr)) #from Christine's new sardine_config.R

fishtime <- fish_times


# fishery sampling area
# should return all model areas, this assumes you see everything that it caught
fishboxes <- c(0:(omlist_ss$boxpars$nbox - 1))

# effective sample size needed for sample_fish
# this effective N is divided by the number of annual timesteps below, so 200 per time
# use as input to the length samples, ages can be a subset
fisheffN <- data.frame(species=survspp, effN=rep(1000, length(survspp)))

#this adjusts for subannual fishery output so original effN is for whole year
fisheffN$effN <- fisheffN$effN/fstepperyr 

# fishery catch cv can be used in sample_survey_biomass
# perfect observation
fish_cv <- data.frame(species=survspp, cv=rep(0.01,length(survspp)))

Run atlantisom and save outputs

Datasets are generated as follows, using atlantisom wrapper functions om_init to assemble initial true atlantis data, om_species to subset true data for desired species, om_index to generate survey biomass and total catch biomass indices, om_comps to generate age and length compositions and average weight at age from surveys and fisheries, and om_diet to generate diet from surveys. Outputs are saved to the atlantisoutput folder (not kept on github due to size):

NOBAom <- om_init(here("SkillAssessment/config/NOBA2Config.R"))

# generate results with climate change signal
#NOBAom <- om_init(here("SkillAssessment/config/NOBAsacc30Config.R"))

NOBAom_ms <- om_species(sppsubset$Name, NOBAom)

#need to change internal call to source in atlantisom om_index om_comps and om_diet functions
#expecting a config folder in same directory as rmd
#this is a workaround

#dir.create(file.path(here("docs/config")))

#file.copy(here("SkillAssessment/config/omdimensions.R"), here("docs/config/omdimensions.R"))

NOBAom_ms_ind <- om_index(usersurvey = c(here("SkillAssessment/config/mssurvey_spring.R"), 
                                         here("SkillAssessment/config/mssurvey_fall.R")), 
                           userfishery = here("SkillAssessment/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_comp <- om_comps(usersurvey = c(here("SkillAssessment/config/mssurvey_spring.R"), 
                                         here("SkillAssessment/config/mssurvey_fall.R")), 
                           userfishery = here("SkillAssessment/config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)


#unlink(here("docs/config"), recursive = TRUE)

The diet file requires pre-processing outside R using the following steps. Even running a “bash” chunk here is still running R system2(); my computer can’t handle this with a 1.2 Gb (zipped) input file.

This awk code is executed in terminal from the directory with the .gz file in it:

zcat < nordic_runresults_01DetailedDietCheck.txt.gz | awk 'NR > 1{s=0; for (i=6;i<=NF;i++) s+=$i; if (s!=0)print}' | gzip > DetDiet.gz

This has been run on macOS, I do not know if this works on windows. On linux it would just say zcat filename; macOS needs zcat < filename to work. The awk statement is saying after the header row (NR>1), sum up the values from column 6 to the last column and keep the row if they are >0. The last pipe sends the new file to a zip file.

Then get the header back into the file: cat works on two gzipped files, so check it with first line, make the zip version with second, and concatenate them with third.

zcat < nordic_runresults_01DetailedDietCheck.txt.gz | head -n1 > DetDietHead.txt
zcat < nordic_runresults_01DetailedDietCheck.txt.gz | head -n1 | gzip > DetDietHead.gz
cat DetDietHead.gz DetDiet.gz > NOBADetDiet.gz

Then we can run om_diet using NOBADetDiet.gz as input:

# add this to config; reading from om_diet isnt working
file_diet <- "NOBADetDiet.gz"

#sub in file with climate signal "NOBAsacc30config.R"

NOBAom_ms_diet <- om_diet(config = here("SkillAssessment/config", "NOBA2config.R"),
                          dietfile = "NOBADetDiet.gz",
                          usersurvey = c(here("SkillAssessment/config/mssurvey_spring.R"), 
                                         here("SkillAssessment/config/mssurvey_fall.R")), 
                          omlist_ss = NOBAom_ms, 
                          n_reps = 1, 
                          save = TRUE)

Create input files for each multispecies model

Code here takes the saved atlantisom .rds output files and generates model input files.

We will use a subset of the full NOBA atlantis output, leaving out the model spin up period and the projection period at the end of the run. The goal will be a 80 year dataset for model testing, starting before the onset of fishing and including all of the F contrast built in.

# if not your first rodeo, read in the existing NOBAom_ms 

source(here("SkillAssessment/config/NOBA2Config.R"))

#climate signal
#source(here("SkillAssessment/config/NOBAsacc30Config.R"))

NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
fitstartyr <- 40
fitendyr <- 120  #use 110 for sacc30

fitstartyr <- fitstartyr - 1 #needed to start at year entered above

#Number of years of data to pull
nyears <- NOBAom_ms$runpar$nyears
total_sample <- NOBAom_ms$runpar$tstop/NOBAom_ms$runpar$outputstep
stepperyr <- if(NOBAom_ms$runpar$outputstepunit=="days") 365/NOBAom_ms$runpar$toutinc

# also need to change atlantis_full to not include 0 timestep
atlantis_full <- c(1:total_sample)  
mod_burnin <- fitstartyr*stepperyr+1
fit_nyears <- fitendyr-fitstartyr
fit_ntimes <- fit_nyears*stepperyr
fittimes <- atlantis_full[mod_burnin:(mod_burnin+fit_ntimes-1)]
#fit_timesteps <- seq(fittimes[stepperyr], max(fittimes), by=stepperyr) #last timestep
#fit_years <- unique(floor(fittimes/stepperyr)) #from Christine's new sardine_config.R #don't use floor
fittimes.days <- fittimes*NOBAom_ms$runpar$toutinc

Length-structured multispecies model (Hydra)

For the length-based model hydra, we can write .csv files to be used as input in the hydradata package, which writes .dat and .pin files.

Hydra needs survey biomass time series, catch time series, survey length composition, fishery length composition, and diet composition data. Hydra also needs temperature data for the consumption equation, input parameters for growth (von Bertalannfy or exponential function parameters), and size bin definitions.

Hydra inputs will need to be in the same species order for each dataset, so first we make a species file and some of the other input data that can come straight from Atlantis.

#source(here("SkillAssessment/config/NOBA2Config.R"))

#climate signal
#source(here("SkillAssessment/config/NOBAsacc30Config.R"))

#all_diets <- atlantisom::read_savedsurvs(d.name, 'survDiet')

# divide time.days by runpar$toutinc to match time in biomass
# aggregate over ages using true biomass at age (survey diet is in proportion by age)

#NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))

#stepperyr <- if(NOBAom_ms$runpar$outputstepunit=="days") 365/NOBAom_ms$runpar$toutinc

#read in survey biomass data
survObsBiom <- read_savedsurvs(d.name, 'survB') #reads in all surveys


# foodweb and species file identify predators and prey; take from true atlantis diet comp
diettrue <- atlantisom::load_diet_comp(d.name, 
                           "nordic_runresults_01DietCheck.txt", fgs = fgs)
ms_diet <- diettrue %>%
  filter(species %in% sppsubset$Name) %>%
  filter(atoutput>0) %>%
  mutate(species = as.character(species),
         prey = as.character(prey))

predPrey <- ms_diet %>%
  filter(prey %in% unique(species)) %>%
  select(species, agecl, time.days, prey, atoutput) %>%
  group_by(species, prey) %>%
  summarise(avgpreyprop = mean(atoutput))

foodweb <- left_join(sppsubset, predPrey, by=c("Name" = "species")) %>%
  select(species=Name, prey, avgpreyprop) %>%
  mutate(avgpreyprop = ceiling(avgpreyprop)) %>%
  spread(species, avgpreyprop) %>%
  filter(!is.na(prey)) %>%
  replace(is.na(.), 0)
  
predOrPrey <- ifelse(colSums(foodweb[-1])>0, 1, 0)

# warning--the fall survey gets 11 species but the spring only gets 10 due to migration so 
# this isn't bugproof; ensure your survey gets all the species to match guildmember hardcoding
spname <- unique(unique(survObsBiom[[1]][[1]]$species))

specieslist <- data.frame(species = spname,
                          guild = seq(1:length(spname)),
                          guildmember = seq(1:length(spname)),
                          predOrPrey = predOrPrey
)

# take lenwt parameters from true atlantis pars
lenwtpar <-  left_join(sppsubset, NOBAom_ms$biol$wl, by=c("Code" = "group")) %>%
  select(Name, a, b) %>%
  arrange(Name)

# fixed parameters:
# intake same for all teleosts
intakespp  <- rbind(species = sort.default(sppsubset$Name),
                    alpha = rep(0.004),
                    beta = rep(0.11))
colnames(intakespp) <- intakespp[1,]
intakespp <- as.data.frame(intakespp[-1,])

# predator size preference ratio same for all species
sizepref <- rbind(species = sort.default(sppsubset$Name),
                  mu = rep(0.5),
                  sigma = rep(2))
colnames(sizepref) <- sizepref[1,]
sizepref <- as.data.frame(sizepref[-1,])

# dimensions for singletons file
singletons <- data.frame(
  debug=4,
  Nyrs=fit_nyears, #may want to take from survey time series instead
  Nspecies=length(sppsubset$Name),
  Nsizebins=10, #5, try extending... June 2022
  Nareas=1,
  Nfleets=2, # cod vs others May 2022 #length(unique(NOBAom_ms$truecatchage_ss$fleet)), #Apr 2022
  Nsurveys=2,
  wtconv=1,
  yr1Nphase=1,
  recphase=-1,
  avg_rec_phase=1,
  recsigmaphase=-1,
  avg_F_phase=-1,
  dev_rec_phase=-1,
  dev_F_phase=-1,
  fqphase=1,
  sqphase=-1,
  ssig_phase=-1,
  csig_phase=-1,
  phimax=1,
  scaleInitialN=1,
  otherFood=1e-7,
  LFI_size=60,
  bandwidth_metric=5,
  assessmentPeriod=3,
  flagMSE=0,
  flagLinearRamp=1,
  baseline_threshold=0.2
)

# M1 same for all species, depends on n length bins
# to get annual M1 of 0.1, this matrix needs to be M1 per timestep
# right now input works for 5 timesteps per year, but should be 
# done in the code with input annual M1 in case timestep changes
M1mat <- matrix(0.02, singletons$Nspecies, singletons$Nsizebins, 
                dimnames = list(c(sort.default(sppsubset$Name)),  c(paste0("sizeclass",seq(1:singletons$Nsizebins))))) 
M1 <- as.data.frame(M1mat)

# sexratio fixed 50:50 for all species
sexratio <- matrix(0.5, 1, singletons$Nspecies,
                   dimnames = list("ratio", c(sort.default(sppsubset$Name))))


# placeholders for unused covariates dimensioned to the number of simulated years
dummycovar <- data.frame(dummy1 = rep(1, singletons$Nyrs))

# placeholder for effort data dimensioned by year and nfleets
allfleets <- atlantisom::load_fisheries(d.name, fisheries.file)
#fleets <- unique(NOBAom_ms$truecatchage_ss$fleet)

# Apr 2022 lets just have one fleet for testing
#fleets <- "allcombined"

# June 2022 one fleet for cod, one for all others
fleets <- c("allbutcod", "codfleet")

dummyeffort <- data.frame(1:singletons$Nyrs,
                          matrix(1, singletons$Nyrs, singletons$Nfleets))
names(dummyeffort) <- c("year", fleets)

# will need to fit von B and maturity curve functions to back out hydra input pars
# maturity from maturity ogive input in biol file
# growth functions from length at age in model...

# write the csvs
# output folder 
o.name <- here("SkillAssessment/msinput/hydra/")

write_csv(foodweb, paste0(o.name,"foodweb_NOBA.csv"))
write_csv(specieslist, paste0(o.name,"speciesList_NOBA.csv"))
write_csv(lenwtpar, paste0(o.name,"lengthweight_species_NOBA.csv"))
write.csv(intakespp, paste0(o.name,"intake_species_NOBA.csv"))
write.csv(sizepref, paste0(o.name,"mortality_M2sizePreference_NOBA.csv"))
write.table(t(singletons), paste0(o.name,"singletons_NOBA.csv"), sep=",",  col.names=FALSE)
write.csv(M1, paste0(o.name,"mortality_M1_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"recruitment_covariates_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"maturity_covariates_NOBA.csv"))
write_csv(dummycovar, paste0(o.name,"growth_covariates_NOBA.csv"))
write_csv(dummyeffort, paste0(o.name,"observation_effort_NOBA.csv"))
write.csv(sexratio, paste0(o.name,"sexratio_NOBA.csv"))

Decisions on size bins for hydra: keep 5 bins per species for now, build in flexibility to allocate size data to any number of bins with any definitions.

June 2022, expanding size bins to 10 per species Model not getting any info from 5 bins. Too aggregated.

First we’ll get the growth parameters for each species using the length data generated from Atlantis. We estimate parameters for both von Bertalannfy and exponential growth functions.

For these surveys, von Bertalannfy parameters cannot be estimated for Mackerel in any era (and Mackerel only show up in the fall survey).

# VBGF packages all take age and length pairs, reformat from length comp at agecl
# break into 3 chunks of time, "era" 48 years each more like our survey
# subsample (NEW: BEFORE AND) after reformatting with uncount because we wouldn't have this many ages
# first subsample is by length keeps lengths with >5 frequency but reduces n to 10% of original
# after reformatting second subsample is random (10% again to get to 1% of original)
# no difference in estimated parameters without second subsample, so do it for speed

len_comp_data <- read_savedsurvs(d.name, 'survLen')

annage_comp_data <- read_savedsurvs(d.name, 'survAnnAge')

vb1 <- vbFuns()
#multiple surveys named in list object, use as filename
for(s in names(len_comp_data)){
  
  annages <- annage_comp_data[[s]][[1]] %>%
    group_by(species, time, agecl) %>%
    rename(trueage = agecl) %>%
    summarise(truenatage = sum(atoutput))
    
  # Figure out the groups that have multiple ages classes in each stage (or
  # cohort)
  # line from omdiensions.R
  sp_age <- NOBAom_ms$funct.group_ss[, c("Name", "NumCohorts", "NumAgeClassSize")]

  multiple_ages <- sp_age[sp_age$NumAgeClassSize>1,]
  
  # finds the average age of each agecl each timestep based on truenatage
  svlen_avgage <- annages %>%
    dplyr::semi_join(len_comp_data[[s]][[1]], by = c("species", "time")) %>%
    dplyr::inner_join(multiple_ages, by = c("species" = "Name")) %>%
    dplyr::mutate(agecl = as.integer(ceiling(trueage/NumAgeClassSize))) %>%
    dplyr::group_by(species, time, agecl) %>%
    dplyr::mutate(avgage = weighted.mean(trueage, truenatage))


  #arrange into long format: year, Species, agecl, length 
  svlenage <- len_comp_data[[s]][[1]] %>%
    dplyr::left_join(svlen_avgage) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::select(species, agecl, avgage, year, upper.bins, atoutput) %>%
    dplyr::mutate(avgage = coalesce(avgage,agecl)) %>%
    dplyr::rename(length = upper.bins) %>%
    dplyr::mutate(era = as.numeric(cut_number(year,3)),
                  nsamp = round(atoutput*0.1)) %>% # initial pass includes lengths w/>5 individuals
    #tidyr::uncount(atoutput) %>% # breaks R, need to subsample first 
    tidyr::uncount(nsamp) %>% # atoutput column has original n lengths
    dplyr::group_by(species, avgage, year) %>%
    dplyr::sample_frac(0.1) # second subsample random within avgage to get 1% of original
  
  saveRDS(svlenage, file.path(d.name, paste0(scenario.name, "_",
                                                       s, "agelensamp.rds")))
# }
# 
#   # read saved sample from above
#   survs <- list.files(path=d.name, pattern = "agelensamp.rds", full.names = TRUE)
# 
#   survname <-  str_match(survs, paste0(scenario.name,"_\\s*(.*?)\\s","*agelensamp.rds"))[,2]
# 
#   agelensamp_data <- lapply(survs, readRDS)
# 
#   names(agelensamp_data) <- survname
# 
# 
# for(s in names(agelensamp_data)){

  # try different start points to see if we can get more species fit
  # this one goes to 11!
  # need because group_map doesn't keep group names
  gk <- svlenage %>%
    group_by(species, era) %>%
    group_keys()
  
  svvbgf3.2 <- svlenage %>%   
    dplyr::group_by(species, era) %>%
    group_map(~try(nls(length~vb1(avgage, Linf, K, t0), data = .,
                       start = vbStarts(length~avgage,data = .,methLinf="longFish"))))
  
  #this gets exponential growth pars, probably won't use this growth mod though
  svexp3 <- svlenage %>%
    dplyr::group_by(species, era) %>%
    group_modify(~ broom::tidy(lm(log(length) ~ log(avgage), data = .x))) %>%
    select(species, era, term, estimate) %>%
    pivot_wider(names_from = "term", values_from = "estimate") %>%
    mutate(psi = exp(`(Intercept)`)) %>%
    rename(kappa = `log(avgage)`) %>%
    select(species, era, kappa, psi)
    
  vbout.2 <- data.frame() 
  for(i in 1:length(svvbgf3.2)) {
    if(is.list(svvbgf3.2[[i]])) {
      spe <- cbind(as.data.frame(gk)[i,], as.data.frame(t(coef(svvbgf3.2[[i]]))))
      vbout.2 <- rbind(vbout.2, spe)
    }
  }
  
  growthpars <- left_join(svexp3, vbout.2) %>%
    mutate(growthType = case_when(
      is.na(Linf) ~ 2,
      TRUE ~ 3)) # vonB without covariates default, exponential if no vonB
  
  # sadly doesn't work so just make a csv for each era and use the one with most spp
  # maxera <- growthpars %>% 
  #   filter(!is.na(Linf)) %>% 
  #   group_by(era) %>% 
  #   tally() %>%
  #   filter(n==1) %>%
  #   select(era)
  
  growthspp <- growthpars %>%
    select(species, psi, kappa, Linf, K, growthType) %>%
    pivot_longer(psi:growthType, names_to = "par") %>%
    pivot_wider(names_from = species, 
                values_from = value) 
  
  growthspp.era <- split(growthspp, list(growthspp$era)) 
  
  for(era in names(growthspp.era)) {
    gpar <- growthspp.era[[era]] %>%
      ungroup()%>%
      select(-era) %>%
      remove_rownames() %>% 
      column_to_rownames(var="par") %>%
      write.csv(paste0(o.name,"growth_species_NOBA_era",era,"_",s,".csv"))
    
  }
  
}

With growth parameters estimated, we determine the size bin definitions. The simplest approach is to have equal sized bins (max observed length/desired number of bins):

# pull observed min and max size from data to optimize start and end bins
minmaxlen <- list()
# read in saved vonB pars--see if seasons are different and will need a replacement if K wasn't estimated
#era <- 1
vonBK <- list()
minmaxvonBK <- list()
for(s in names(len_comp_data)){
  minmaxlendf <- len_comp_data[[s]][[1]] %>%
    dplyr::group_by(species) %>%
    dplyr::summarise(minlen = min(lower.bins, na.rm = TRUE),
                     maxlen = max(upper.bins, na.rm = TRUE),
                     range = maxlen-minlen) 
  
  for(era in 1:3){
  vonBKer <- read.csv(paste0(o.name,"growth_species_NOBA_era",era,"_",s,".csv")) %>%
    rename(par = X) %>%
    pivot_longer(!par, names_to = "species") %>%
    pivot_wider(names_from = par,
                names_prefix = era,
                values_from = value) %>%
    select(species, paste0(era,"K"))
  
  ifelse(era==1,
         vonBKdf <- vonBKer,
         vonBKdf <- left_join(vonBKdf, vonBKer))
  }

  minmaxvonBKdf <- left_join(minmaxlendf,vonBKdf)
  
  minmaxlen[[s]] <- minmaxlendf
  vonBK[[s]] <- vonBKdf
  minmaxvonBK[[s]] <- minmaxvonBKdf

}

equalbinwidths <- function(Lmax, Nsizebins){
  bindef <- max(1,ceiling(Lmax/Nsizebins))
  binwidths <- rep(bindef, Nsizebins)
  return(binwidths)
}

# for fixed bins across surveys find min and max observed lengths 

maxLrange <- bind_rows(minmaxvonBK) %>%
  group_by(species) %>%
  summarise(minminL = min(minlen),
            maxmaxL = max(maxlen))
  
Nsizebins  <-  singletons$Nsizebins

newbins <- maxLrange %>%
  group_by(species) %>%
  group_modify(~broom::tidy(equalbinwidths(.$maxmaxL, Nsizebins))) %>%
  mutate(lb = paste0("sizebin",seq(1:Nsizebins))) %>%
  pivot_wider(names_from = lb, values_from = x)

sizebins <- as.data.frame(newbins) %>%
#  {if (utils::packageVersion("dplyr") > "0.9.9") {
#    relocate(species, .after = last_col())
#  } else {
    select(-species, everything()) %>%
#  }} 
  rename('commentField-notused' = species)

write_csv(sizebins, paste0(o.name,"length_sizebins_NOBA.csv"))

Survey biomass and catch datasets in long input format (all surveys together, all fleets together eventually). Fishery catch is not currently separated into fleets; lets do that in phase II:

These datasets are filtered to the model fit start and end years defined above.

However, we need to recode the years for input into Hydra, which expects years 1:Nyrs

#read in survey biomass data
survObsBiom <- atlantisom::read_savedsurvs(d.name, 'survB') #reads in all surveys

# get cvs from config files
svcon <- list.files(path=here("SkillAssessment/config/"), pattern = "*survey*", full.names = TRUE)
fishcon <- list.files(path=here("SkillAssessment/config/"), pattern = "*fishery*", full.names = TRUE)

omlist_ss <- NOBAom_ms
source(here("SkillAssessment/config/omdimensions.R"))

svcvlook <- tibble()
for(c in 1:length(svcon)){
  source(svcon[c])
  surv_cv_n <- surv_cv %>% 
    mutate(survey=survey.name)
  svcvlook <- bind_rows(svcvlook, surv_cv_n)
}

fcvlook <- tibble()
for(c in 1:length(fishcon)){
  source(fishcon[c])
  fish_cv_n <- fish_cv %>%
    mutate(fishery=fishery.name)
  fcvlook <- bind_rows(fcvlook, fish_cv_n)
}

allsvbio <- tibble()

#multiple surveys named in list object, use as filename
for(s in names(survObsBiom)){
  #arrange into wide format: year, Species1, Species2 ... and write csv
  svbio <- survObsBiom[[s]][[1]] %>%
    dplyr::filter(time %in% fittimes) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    dplyr::select(species, year, atoutput) %>%
    dplyr::rename(biomass = atoutput) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::left_join(svcvlook) %>%
    dplyr::select(survey, year, species, everything()) %>%
    dplyr::arrange(survey, species, year)
  
  allsvbio <- bind_rows(allsvbio, svbio)
}
write_csv(allsvbio, paste0(o.name,"observation_biomass_NOBA_allsurvs.csv"))

catchbio_ss <- read_savedfisheries(d.name, 'Catch')

# map species to fleets
# for now a cod fleet and an everyone else fleet
# this is made in a code block below lines 1440- ish
fleetdef <- read.csv(paste0(o.name, "fishing_q_indicator_NOBA.csv"))
fleetdef$Name <- as.character(fleetdef$Name)

allcatch <- tibble()

# remove species with no catch across whole time series Apr 2022
# added lines 655-659
for(f in names(catchbio_ss)){
  catchbio <- catchbio_ss[[f]][[1]] %>%
    #dplyr::filter(time>0) %>%
    dplyr::filter(time %in% fittimes.days) %>%
    dplyr::mutate(year = time/365) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    dplyr::select(species, year, atoutput) %>%
    dplyr::rename(catch = atoutput) %>%
    dplyr::group_by(species) %>%
    dplyr::mutate(totcatch = sum(catch)) %>%
    dplyr::ungroup() %>%
    dplyr::filter(totcatch > 0) %>%
    dplyr::select(-totcatch) %>%
    dplyr::mutate(area = 1) %>%
    dplyr::left_join(fcvlook) %>%
    dplyr::left_join(fleetdef, by = c("species" = "Name")) %>%
    dplyr::mutate(fishery = qind) %>%
    dplyr::select(fishery, area, year, species, catch, cv) %>%
    dplyr::arrange(fishery, species, year)
      
  allcatch <- bind_rows(allcatch, catchbio)  
}

write_csv(allcatch, paste0(o.name,"observation_catch_NOBA_allfisheries.csv"))

(Below some diagnostic B, catch, and F plots to see if Atlantis is doing what we meant. No peeking.)

#optional Atlantis check: print true B and catch time series
NOBArun <- "sacc_38"

biotrue <- ggplot(NOBAom_ms$truetotbio_ss, aes(x=time, y=atoutput)) +
  geom_line() +
  theme_tufte() +
  labs(title = paste0("True bio NOBA ", NOBArun)) +
  facet_wrap(~species, scales = "free_y")

biotrue

catchtrue <- ggplot(NOBAom_ms$truecatchbio_ss, aes(x=time, y=atoutput)) +
  geom_line() +
  theme_tufte() +
  labs(title = paste0("True catch NOBA ", NOBArun)) +
  facet_wrap(~species, scales = "free_y")

catchtrue

# what is F
file.mort <- file.path(d.name, paste0(scenario.name, "Mort.txt"))

mortish <- read.table(file.mort, header = TRUE)

relF_ss <- mortish %>%
  select(Time, c(paste0(NOBAom_ms$code_ss, ".F"))) %>%
  pivot_longer(-Time, names_to = "spcode", values_to = "relF")

relM_ss <- mortish %>%
  select(Time, c(paste0(NOBAom_ms$code_ss, ".M"))) %>%
  pivot_longer(-Time, names_to = "spcode", values_to = "relM")

plotF <- ggplot(data=relF_ss, aes(x=Time/365, y=relF)) +
  geom_line() +
  theme_tufte() +
  labs(subtitle = paste(scenario.name, NOBArun)) + 
  facet_wrap(~spcode, scales = "free_y")

plotF 

Now write length composition input files in long format to have surveys together in one file. Also add fishery length comps:

sizebins <- read_csv(paste0(o.name,"length_sizebins_NOBA.csv"))

modbins <- sizebins %>%
  dplyr::rename(species = `commentField-notused`) %>%
  tidyr::pivot_longer(!species, names_to = "sizebin", values_to = "binwidth") %>%
  dplyr::group_by(species) %>%
  dplyr::mutate(modbin.min = pcumsum(binwidth),
         modbin.max = cumsum(binwidth)) %>%
  dplyr::mutate(sizebin = factor(sizebin, levels = unique(sizebin))) 

  #stringr::str_order(modbins$sizebin, numeric = TRUE)
  #arrange(species, sizebin)

allsvlenbin <- tibble()

#multiple surveys named in list object, add to file
for(s in names(len_comp_data)){
  #arrange into long format: year, Species, agecl, length 
  svlenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::filter(time %in% fittimes) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    tidyr::spread(sizebin, sumlen) %>%
    ungroup() %>%
    dplyr::mutate(inpN = rowSums(.[,-(1:2)], na.rm = TRUE)) %>%
    dplyr::mutate(type = 0) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::mutate_at(vars(contains("sizebin")), ~./inpN) %>%
    dplyr::select(survey, year, species, type, inpN, everything())
  
  allsvlenbin <- bind_rows(allsvlenbin, svlenbin)
}
# use -999 for missing value
allsvlenbin <- allsvlenbin %>%
  replace(is.na(.),-999)

# cap inpN at 1000
allsvlenbin$inpN[allsvlenbin$inpN > 1000] <- 1000

write_csv(allsvlenbin, paste0(o.name,"observation_lengths_NOBA_allsurvs.csv"))

#requires read_savedfisheries function, added to atlantisom 30 June
catchlen_ss <- atlantisom::read_savedfisheries(d.name, "catchLen")

# FISHERY IS SAMPLED 5x PER YEAR--NEED TO ACCOUNT FOR THIS?
# This code sums lengths for all all fishery timesteps into year

allfishlenbin <- tibble()
for(f in names(catchlen_ss)){
  #arrange into long format: year, Species, agecl, length 
  fishlenbin <- catchlen_ss[[f]][[1]] %>%
    dplyr::filter(time %in% fittimes) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    tidyr::spread(sizebin, sumlen) %>%
    ungroup() %>%
    dplyr::mutate(inpN = rowSums(.[,-(1:2)], na.rm = TRUE)) %>%
    dplyr::mutate(type = 0) %>%
    dplyr::left_join(fleetdef, by = c("species" = "Name")) %>%
    dplyr::rename(fishery = qind) %>%
    dplyr::mutate(area = 1) %>%
    dplyr::mutate_at(vars(contains("sizebin")), ~./inpN) %>% #proportion
    dplyr::select(fishery, area, year, species, type, inpN, everything()) %>%
    dplyr::arrange(fishery, area, species, year)
  
  allfishlenbin <- bind_rows(allfishlenbin, fishlenbin)
}

#add back columns with no lengths as NAs
missing <- setdiff(modbins$sizebin, names(allfishlenbin))
allfishlenbin[missing] <- NA_real_ 
colorder <- c("fishery", "area", "year", "species", "type", "inpN", levels(modbins$sizebin))
allfishlenbin <- allfishlenbin[colorder]

# use -999 for missing value
allfishlenbin <- allfishlenbin %>%
  replace(is.na(.),-999)

# cap inpN at 1000
allfishlenbin$inpN[allfishlenbin$inpN > 1000] <- 1000


write_csv(allfishlenbin, paste0(o.name,"observation_lengths_NOBA_allfisheries.csv"))

Write diet composition data in long format:

# same file as for MSSPM, diet proportion by agecl
all_diets <- atlantisom::read_savedsurvs(d.name, 'survDiet')

# get size bins for predators
sizebins <- read_csv(paste0(o.name,"length_sizebins_NOBA.csv"))

modbins <- sizebins %>%
  dplyr::rename(species = `commentField-notused`) %>%
  tidyr::pivot_longer(!species, names_to = "sizebin", values_to = "binwidth") %>%
  dplyr::group_by(species) %>%
  dplyr::mutate(modbin.min = pcumsum(binwidth),
         modbin.max = cumsum(binwidth)) %>%
  dplyr::mutate(sizebin = factor(sizebin, levels = unique(sizebin))) 

  #stringr::str_order(modbins$sizebin, numeric = TRUE)
  #arrange(species, sizebin)

# get Dirichlet alphamults from config files
svcon <- list.files(path=here("SkillAssessment/config/"), pattern = "*survey*", full.names = TRUE)

svalphamultlook <- tibble()
for(c in 1:length(svcon)){
  source(svcon[c])
  svalp <- tibble(
    surv_alphamult_n = alphamult, 
    survey = survey.name
  )
  svalphamultlook <- bind_rows(svalphamultlook, svalp)
}

# diet is by predator agecl, need diet by predator sizebin
# map predator agecl to lengthbin using length_comp_data from same survey

allsvdietprop <- tibble()

#multiple surveys named in list object
for(s in names(all_diets)){
  #arrange into lookup for agecl to lengthbin in numbers of fish 
  svagelenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::filter(time %in% fittimes) %>%
    dplyr::mutate(year = ceiling(time/stepperyr)) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    dplyr::select(species, agecl, year, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, year, agecl, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    dplyr::group_by(species, year, sizebin) %>%
    dplyr::mutate(propage = sumlen/sum(sumlen)) #proportion of each agecl contributing to lengthbin
  
  svdietprop <- all_diets[[s]][[1]] %>%
    dplyr::filter(time.days %in% fittimes.days) %>%
    dplyr::mutate(year = ceiling(time.days/365)) %>%
    dplyr::mutate(year = year-fitstartyr) %>% #year starts at 1
    # join with agecl to length lookup
    dplyr::select(species, agecl, year, prey, dietSamp) %>%
    dplyr::left_join(svagelenbin) %>%
    dplyr::mutate(dietpropage = dietSamp*propage) %>% #reweight diets for lengthbins
    dplyr::group_by(species, year, sizebin, prey) %>%
    dplyr::summarise(dietsize = sum(dietpropage)) %>%
    dplyr::filter(prey %in% unique(modbins$species)) %>% #drops prey that aren't our modeled species
    tidyr::spread(prey, dietsize) %>%
    ungroup() %>%
    dplyr::filter(!is.na(sizebin)) 
  
   # add back any modeled species that weren't prey so they show up as columns
  missedpreds <- setdiff(unique(modbins$species), names(svdietprop)[-(1:3)])
  svdietprop[missedpreds] <- NA_real_
  svdietprop[,-(1:3)] <- svdietprop[unique(modbins$species)]
  svdietprop <- svdietprop %>% 
    dplyr::mutate(allotherprey = 1-rowSums(.[,-(1:3)], na.rm = TRUE)) %>%
    dplyr::mutate(survey = s) %>%
    dplyr::left_join(svalphamultlook) %>% #what is effective sample size for Dirichlet?
    dplyr::mutate(inpN = max(surv_alphamult_n/100000, 5)) %>% #rescale this input for now
    dplyr::select(survey, year, species, sizebin, inpN, everything(), -surv_alphamult_n)
  
  allsvdietprop <- bind_rows(allsvdietprop, svdietprop)
}

# use -999 for missing value
allsvdietprop <- allsvdietprop %>%
  replace(is.na(.),-999)

write_csv(allsvdietprop, paste0(o.name,"observation_diets_NOBA_allsurvs.csv"))

Output temperature data from surveys using “Temp” variable in .nc file and the atlantisom::load_nc_physics function. A survey (timestep, polygons, possibly observation error) can then be applied to this output.

We can get surface and bottom temp from Atlantis using the layers from each polygon, with layer 0 being the bottom and the maximum layer in the polygon being the surface (according to the Atlantis wiki/Atlantis output files/Atlantis NetCDF structure.

The below has been formalized into an atlantisom function, and reformatted to have both surveys in one file to match other data inputs:

# taken from existing load_nc to explore
# file.nc <- file.path(d.name, "outputnordic_runresults_01.nc")
# 
#   # Load ATLANTIS output!
#   at_out <- RNetCDF::open.nc(con = file.nc)
# 
#   # Get info from netcdf file! (Filestructure and all variable names)
#   var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_out)$nvars - 1),
#     function(x) RNetCDF::var.inq.nc(at_out, x)$name)
#   n_timesteps <- RNetCDF::dim.inq.nc(at_out, 0)$length
#   n_boxes     <- RNetCDF::dim.inq.nc(at_out, 1)$length
#   n_layers    <- RNetCDF::dim.inq.nc(at_out, 2)$length
# 
# RNetCDF::close.nc(at_out)

#'@physic_variables A character value spefifying which variables
#'   should be loaded. Only one variable can be loaded at a time.
#'   Currently, only the following variables are allowed:
#'   "salt", "NO3", "NH3", "Temp", "Oxygen", "Si", "Det_Si", "DON",
#'    "Chl_a", "Denitrifiction", "Nitrification".

truetemp <- atlantisom::load_nc_physics(dir = d.name,
                  file_nc = "nordic_runresults_01.nc",
                  physic_variables = "Temp",
                  aggregate_layers = FALSE,
                  bboxes = atlantisom::get_boundary(NOBAom_ms$boxpars))
  #if(verbose) message("Temperature read in.")

# this averages all depth layers, don't want this
# truetempagg <- atlantisom::load_nc_physics(dir = d.name,
#                   file_nc = "outputnordic_runresults_01.nc",
#                   physic_variables = "Temp",
#                   aggregate_layers = TRUE,
#                   bboxes = atlantisom::get_boundary(NOBAom_ms$boxpars))

truebottomtemp <- truetemp %>%
  filter(layer == 0) # according to Atlantis wiki, layer 0 is bottom

truesurfacetemp <- truetemp %>%
  filter(layer<7) %>%  #layer 7 is always identical temp to layer 0, may be the added sediment layer?
  group_by(polygon, time) %>%
  filter(layer==max(layer)) %>%
  ungroup

surveyenv <- function(trueenv, 
                      survboxes,
                      survtimes) {
  surveydat <- trueenv %>%
    dplyr::filter(polygon %in% survboxes) %>%
    dplyr::filter(time %in% survtimes)
  
  return(surveydat)
}

omlist_ss <- NOBAom_ms
source(here("SkillAssessment/config/omdimensions.R"))

#need to weight by polygon area to get mean annual temp time series for each survey
boxarea <- map_dfr(NOBAom_ms$boxpars$boxes, "area") %>%
  pivot_longer(everything(), names_to = "polygon", values_to = "area") %>%
  mutate(polygon = as.integer(polygon)) %>%
  mutate(proparea = area/sum(area))


source(here("SkillAssessment/config/mssurvey_fall.R"))

fallcensus_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

fallcensus_btempindex <- left_join(fallcensus_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::filter(time %in% fittimes) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

source(here("SkillAssessment/config/mssurvey_spring.R"))

springcensus_bottomtemp <- surveyenv(truebottomtemp, survboxes = survboxes, survtimes = survtime)

springcensus_btempindex <- left_join(springcensus_bottomtemp, boxarea) %>%
  dplyr::group_by(time) %>%
  dplyr::summarise(meantemp = weighted.mean(atoutput, proparea)) %>%
  dplyr::filter(time %in% fittimes) %>%
  dplyr::mutate(year = ceiling(time/stepperyr)) %>%
  dplyr::select(year, meantemp) %>%
  write_csv(paste0(o.name,"observation_temperature_NOBA_",survey.name,".csv"))

Code to plot survey temps to check against Atlantis inputs:

#optional temperature plots
NOBArun <- "sacc_38" 

fallavgtemp <- ggplot(fallcensus_btempindex, aes(x=year, y=meantemp)) + 
  geom_line() + 
  labs(title = paste0("Fall mean bottom temperature NOBA", NOBArun))

springavgtemp <- ggplot(springcensus_btempindex, aes(x=year, y=meantemp)) + 
  geom_line() + 
  labs(title = paste0("Spring mean bottom temperature NOBA", NOBArun))

library(patchwork)
fallavgtemp + springavgtemp

Stomach weights are needed as input to hydra. At present each species has a stomach weight for each size bin that is repeated for each year. We have total consumed weight for each predator age class at each timestep in the detailed diet atlantis output, so would need to map age classes to length bins, sum consumed weight by length bin and divide by total numbers in that length bin to get stomach weight.

What we have saved so far is a diet comp in proportion that has had bias and observation error added to proportions. However, we need the stomach weight portion of the equation, which has the survey timing/area/efficiency/selectivity applied to true consumed weight. The hydra input is mean daily stomach weight in grams. From the Atlantis manual, “DetailedDietCheck.txt returns the total consumed biomass since the last output given for each cell (box and layer) of each age group of each functional group.” So daily per capita consumed biomass is this output, summed over prey for total consumption, divided by the numbers from the same survey design, divided by the output step omlist_ss$runpar$outputstep.

STILL NEEDS RESOLUTION–the per capita consumption from Atlantis detailedDietCheck.txt seems low relative to actual data

See here for the calculation above for NOBA and CCA. And an update based on new calculations for total consumption based on PROD.nc below.

For now, we map existing Georges Bank species intake-at-size (average daily stomach weight) to the NOBA species as follows:

We will substitute intake-at-size from these similar taxa/size GB species for NOBA:

We will modify or subsitute intake-at-size for the rest of the NOBA species based on size:

  • NOBA Redfish have a maximum size of 47 cm, use haddock intake
  • NOBA Saithe have a maximum size of 130 cm; use Acod intake
  • NOBA Polar_cod have a maximum size of 25 cm; use herring intake
  • NOBA Capelin are smaller than any GB modeled species, max size 21 cm; use herring intake
  • NOBA Green_halibut are large, max size 120 cm; use spinydog intake

Code to make intake csv for input; hardcoded based on these decisions.

# this is intake by species in current hydra_sim

# ,sizeclass1,sizeclass2,sizeclass3,sizeclass4,sizeclass5
# spinydog,0.090189781,1.952671779,9.334322368,28.80493233,77.55885308
# winterskate,0.093423453,0.939220418,3.915192982,11.46903371,44.88111925
# Aherring,0.01,0.03,0.8,1.5,3
# Acod,0.035,0.103736364,1.952671779,18.42060702,84.48736889
# haddock,0.031,0.096183673,0.939220418,3.915192982,12.07313904
# yellowtailfl,0.016072464,0.086965517,0.237958824,0.442743017,1.583142857
# winterfl,0.016072464,0.086965517,0.237958824,0.442743017,1.583142857
# Amackerel,0.01,0.08,1.3,2,3
# silverhake,0.039893347,0.179748344,0.512104803,4.879748476,20.262
# goosefish,0.090189781,1.952671779,9.334322368,28.80493233,77.55885308

stomwt <- matrix(0, singletons$Nspecies, singletons$Nsizebins, 
                dimnames = list(c(sort.default(sppsubset$Name)),  c(paste0("sizeclass",seq(1:singletons$Nsizebins)))))

stomwt["Blue_whiting",] <- c(0.039893347,0.179748344,0.512104803,4.879748476,20.26) #silverhake
stomwt["Capelin",] <- c(0.01,0.03,0.8,1.5,3) #Aherring
stomwt["Green_halibut",] <- c(0.090189781,1.952671779,9.334322368,28.80493233,77.55885308) #spinydog
stomwt["Haddock",] <- c(0.031,0.096183673,0.939220418,3.915192982,12.07313904) #haddock
stomwt["Long_rough_dab",] <- c(0.016072464,0.086965517,0.237958824,0.442743017,1.583142857) #yellowtailfl
stomwt["Mackerel",] <- c(0.01,0.08,1.3,2,3) #Amackerel
stomwt["North_atl_cod",] <- c(0.035,0.103736364,1.952671779,18.42060702,84.48736889) #Acod
stomwt["Norwegian_ssh",] <- c(0.01,0.03,0.8,1.5,3) #Aherring
stomwt["Polar_cod",] <- c(0.01,0.03,0.8,1.5,3) #Aherring
stomwt["Redfish",] <- c(0.031,0.096183673,0.939220418,3.915192982,12.07313904) #haddock
stomwt["Saithe",] <- c(0.090189781,1.952671779,9.334322368,28.80493233,77.55885308) #spinydog

write.csv(stomwt, paste0(o.name,"intake_stomachContent_NOBA.csv"))

JUNE 2022

Now that we are expanding to 10 size bins, I either need to interpolate these proxies based on GB, or find another source for this input. Alternative approach would be to use atlantisom biomass_eaten output to get total per capita consumption

This is output from a new atlantisom::calc_pred_cons() function based on the Atlantis PROD.nc output. This will be incorporated into atlantisom::run_truth() when assumptions about output units etc, can be confirmed.

# here we have to rerun parts of atlantisom::run_truth() that weren't saved previously.
# first redefine the file names etc as used within run_truth

dir <- d.name

file_fgs <- functional.groups.file #"nordic_groups_v04.csv"
file_biolprm <- biol.prm.file #"nordic_biol_incl_harv_v_011_1skg.prm"
file_bgm <- box.file #"Nordic02.bgm"
file_init <- initial.conditions.file #"nordic_biol_v23.nc"
file_runprm <- run.prm.file #"nordic_run_v01.xml"
scenario <- scenario.name #"nordic_runresults_01"

select_groups <- mskeyrun::simFocalSpecies$Name # 11 mskeyrun fish species

verbose <- FALSE

# run_truth snippets below

  # Read in the functional groups csv since that is used by many functions
  fgs <- load_fgs(dir = dir, file_fgs = file_fgs)
  # Read in the biomass pools
  bps <- load_bps(dir = dir, fgs = file_fgs, file_init = file_init)
  # Read in the biological parameters
  biol <- load_biolprm(dir = dir, file_biolprm = file_biolprm)
  # Read in the run parameters
  runprm <- load_runprm(dir = dir, file_runprm = file_runprm)

  nc_out <- paste0(scenario, ".nc")
  nc_prod <- paste0(scenario, "PROD.nc")
  
  # Get the boundary boxes
  allboxes <- load_box(dir = dir, file_bgm = file_bgm)
  boxes <- get_boundary(allboxes)



  eat <- load_nc(dir = dir,
                     file_nc = nc_prod,
                     bps = bps,
                     fgs = fgs,
                     select_groups = select_groups,
                     select_variable = "Eat",
                     check_acronyms = TRUE,
                     bboxes = boxes)
  if(verbose) message("Eaten read in.")

  grazing <- load_nc(dir = dir,
                 file_nc = nc_prod,
                 bps = bps,
                 fgs = fgs,
                 select_groups = select_groups,
                 select_variable = "Grazing",
                 check_acronyms = TRUE,
                 bboxes = boxes)
  if(verbose) message("Grazing read in.")

  vol <- load_nc_physics(dir = dir,
                         file_nc = nc_out,
                         physic_variables = "volume",
                         aggregate_layers = FALSE,
                         bboxes = boxes)
  if(verbose) message("Volume read in.")

# now run atlantisom::calc_pred_cons() and save it
  biomass_eaten <- atlantisom::calc_pred_cons(eat = eat, grazing = grazing, vol = vol, biolprm = biol, runprm = runprm)

  saveRDS(biomass_eaten, file.path(d.name, paste0(scenario.name, "_", "biomass_eaten.rds")))

We have a preliminary true biomass_eaten (total consumption) output for each predator, so we take that, apply our surveys with a new atlantisom wrapper om_consindex(), and take the survey consumption at agecl result divided by total numbers at agecl from the survey to get per capita stomach weight (intake) at agecl. This needs to be converted to length bins, which follows a similar process to the agecl based diet. We’ll take the average intake for the period being modeled (Atlantis year 40-80 ish) in each length bin as hydra input. Actually survey agecl sampling isn’t implemented yet so we’ll use the true consumption at agecl below. We can do total consumption from the survey.

omlist_ss <- NOBAom_ms

omlist_ss$truecons_ss <- biomass_eaten %>%
  dplyr::mutate(atoutput = bio_eaten) #needed for create_survey to work

cons_cv <- surv_cv # will need to add to survey files, use same format as surv_cv

# test sample survey consumption
sample_survey_consumption <- function(dat,cv) {

    #sum over boxes and ages (the sampled boxes were already subset in create functions)
    totconsB <- aggregate(dat$atoutput,list(dat$species,dat$time),sum)
    names(totconsB) <- c("species","time","constons")

    #add observation error
    totconsBobs <- merge(totconsB,cv,by="species",all.x=T)
    totconsBobs$var <- log(totconsBobs$cv^2+1)
    totconsBobs$obsConsB <- rlnorm(nrow(totconsBobs), log(totconsBobs$constons)-totconsBobs$var/2, sqrt(totconsBobs$var))


    out <- data.frame(species=totconsBobs$species,
                      agecl = NA,
                      polygon=NA,
                      layer=NA, time=totconsBobs$time,
                      atoutput=totconsBobs$obsConsB)

    return(out)
}


# test wrapper
om_consindex <- function(usersurvey = usersurvey_file,
                     omlist_ss,
                     n_reps = n_reps,
                     save = TRUE){

  #one script for dimension parameters to be used in multiple functions
  source(here("SkillAssessment/config/omdimensions.R"), local = TRUE)

  # user options for survey--default is a census with mid-year sample
  # allows muliple surveys
  survObsConsBs <- list()

  for (s in usersurvey)
  {
    source(s, local = TRUE)
    
    # what form of survey efficiency and selectivity should be applied to consumption?
    # create_survey function expects NatAge input so need to rename input to use this

    survey_consB <- atlantisom::create_survey(dat = omlist_ss$truecons_ss, #from calc_pred_cons()
                                          time = survtime,
                                          species = survspp,
                                          boxes = survboxes,
                                          effic = surveffic,
                                          selex = survselex)

    # this is the step to repeat n_reps time if we want different realizations
    # of the same survey design specified above; only observation error differs
    # using the census cv of 0 will produce identical reps!
    survObsConsB <- list()
    for(i in 1:n_reps){
      survObsConsB[[i]] <- sample_survey_consumption(survey_consB, cons_cv)
    }

    #save survey indices, takes a long time to generate with lots of reps/species
    if(save){
      saveRDS(survObsConsB, file.path(d.name, paste0(scenario.name, "_",
                                                     survey.name, "survObsConsB.rds")))
    }

    survObsConsBs[[survey.name]] <- survObsConsB
  }


  indices <- list("survObsConsB" = survObsConsBs)

  return(indices)
}

om_consindex(usersurvey = c(here("SkillAssessment/config/mssurvey_spring.R"), 
                            here("SkillAssessment/config/mssurvey_fall.R")), 
             omlist_ss = omlist_ss,
             n_reps = 1,
             save = TRUE)

We have a total consumption index but not consumption at age, which we need for intake.

This is true consumption (blue) to compare with sampled total consumption above, (black and grey are fall and spring surveys, respectively–efficiency is 1 and there is no age selectivity, cv each species is 0.1).

trueaggcons <- omlist_ss$truecons_ss %>%
  dplyr::group_by(species, time) %>%
  dplyr::summarise(totaggcons = sum(atoutput))

survey.name <- c("BTS_fall_allbox_effic1", "BTS_spring_allbox_effic1")

survcons1 <- readRDS(paste0(d.name, "/", scenario.name, "_",
                           survey.name[1], "survObsConsB.rds"))

survcons2 <- readRDS(paste0(d.name, "/", scenario.name, "_",
                           survey.name[2], "survObsConsB.rds"))

plotcons <- ggplot2::ggplot() +
  geom_line(data = trueaggcons, aes(x=time, y=totaggcons), 
            color = "blue", alpha = 0.5) +
  geom_point(data = survcons1[[1]], aes(x=time, y=atoutput), color = "gray") +
  geom_point(data = survcons2[[1]], aes(x=time, y=atoutput)) +
  ggthemes::theme_tufte() +
  facet_wrap(~species, scales = "free")
  
plotcons

We can compare true consumption at age with sampled if we can figure out how to sample it. This is true consumption for intake for now. This is the easiest thing to include in the hydra .dat file.

truenumsagecl <- omlist_ss$truenums_ss %>%
  dplyr::group_by(species, time) %>%
  dplyr::summarise(totN = sum(atoutput)) #over agecl, layer and polygon

# Atlantis output rabbit hole # 2956294
# are the numbers (Nums) consistent between nc files? I'm worried they aren't
# duplicate numbers by layer for some species that don't add up to annage output?
# ok we checked and they do add up in plot below, main difference is 
# annagebio.nc has 0 for mackerel in timesteps where they migrate out of the
# system and .nc has nothing.
# initial condition (time 0) had repeated numbers per layer

nums1 <- atlantisom::load_nc(dir = dir,
                 file_nc = nc_out,
                 bps = bps,
                 fgs = fgs,
                 select_groups = select_groups,
                 select_variable = "Nums",
                 check_acronyms = TRUE,
                 bboxes = boxes)

# Andy confirmed that this gives same output as above, won't work on my machine?
# nums2 <- atlantistools::load_nc(nc = file.path(d.name, nc_out),
#                                 bps = bps,
#                                 fgs = file.path(d.name,file_fgs),
#                                 select_groups = "Long_rough_dab",
#                                 select_variable = "Nums",
#                                 check_acronyms = TRUE,
#                                 prm_run = file.path(d.name,file_runprm),
#                                 bboxes = boxes)


truenumsage <- omlist_ss$truenumsage_ss %>%
    dplyr::group_by(species, time) %>%
    dplyr::summarise(totN = sum(atoutput)) #over true age and polygon

 
# check for agreement in Nums output between nc files, they agree 
ggplot2::ggplot() +
  geom_point(data=truenumsage, aes(x=time, y=totN), color = "blue", alpha = 0.2) +
  geom_point(data=truenumsagecl, aes(x=time, y=totN), color = "yellow", alpha = 0.2) +
  facet_wrap(~species, scales = "free")

# so now can divide the cons at age class by the nums at age class and convert to lengths

truenumsatagecl <- omlist_ss$truenums_ss %>%
  dplyr::group_by(species, time, agecl) %>%
  dplyr::summarise(totNagecl = sum(atoutput)) #over layer and polygon

intake <- omlist_ss$truecons_ss %>%
  dplyr::group_by(species, time, agecl) %>%
  dplyr::summarise(totconsagecl = sum(atoutput)) %>%
  dplyr::left_join(truenumsatagecl) %>%
  dplyr::mutate(intakeg = (totconsagecl/totNagecl)*1000000)

# for most species older ages have higher per capita intake as expected
# these look more like the right scale for the species
ggplot2::ggplot() +
  geom_point(data=intake, aes(x=time, y=intakeg, colour = agecl)) +
  facet_wrap(~species, scales = "free")

# now make the intake at length bin for the .dat file  
# this will be based on "real" intake for now rather than survey estimated
# BUT! length comps are from surveys. so subset intake to survey times to match
# then... average them?

len_comp_data <- atlantisom::read_savedsurvs(d.name, 'survLen')

intakeatlen <- tibble()

#multiple surveys named in list object
for(s in names(len_comp_data)){
  #arrange into lookup for agecl to lengthbin in numbers of fish 
  svagelenbin <- len_comp_data[[s]][[1]] %>%
    dplyr::filter(time %in% fittimes) %>%
    dplyr::select(species, agecl, time, upper.bins, atoutput) %>%
    dplyr::left_join(modbins) %>%
    dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
    dplyr::group_by(species, time, agecl, sizebin) %>%
    dplyr::summarise(sumlen = sum(atoutput)) %>%
    dplyr::group_by(species, time, sizebin) %>%
    dplyr::mutate(propage = sumlen/sum(sumlen)) #proportion of each agecl contributing to lengthbin
  
  lenintake <- intake %>%
    dplyr::mutate(time = as.integer(time)) %>%
    dplyr::filter(time %in% unique(svagelenbin$time)) %>%
    # join with agecl to length lookup
    dplyr::left_join(svagelenbin) %>%
    dplyr::filter(!is.na(sizebin)) %>% # see note below
    dplyr::mutate(conspropage = totconsagecl*propage) %>% #reweight cons for lengthbins
    dplyr::mutate(totNpropage = totNagecl*propage) %>% #reweight nums for lengthbins
    dplyr::group_by(species, time, sizebin) %>%
    dplyr::summarise(intakesize = (sum(conspropage)/sum(totNpropage))*1000000) %>%
    tidyr::complete(sizebin) %>% #adds back any missing sizebin factor levels
    tidyr::spread(sizebin, intakesize) %>%
    ungroup()  
  
  intakeatlen <- bind_rows(intakeatlen, lenintake)
  
}

# NOTE we have some agecl consumption and numbers data with no equivalent size bin
# this is because the lengths are estimated from a sample of true ages
# and sampling doesn't get age classes that have very few numbers
# for now I think this is ok, just recognize that the conversion to length
# bins makes this not quite "true" consumption at length
# most of the misses are for larger cod in this dataset, but its 0.8% of rows

avgintake <- intakeatlen %>%
  dplyr::group_by(species) %>%
  dplyr::summarise(across(everything(), ~mean(.x, na.rm=T))) %>%
  dplyr::select(-time) %>%
  dplyr::rename_with(~gsub("bin", "class", .x))

# the last issue is we still have blanks in here, not ok for the .dat file
# two possible ways to fill are to use the adjacent value or to interpolate
# below just fills with nearest available
# could get fancier and try to 
# interpolate if increasing at length, carry over adjacent if not

avgintakefill <- avgintake %>%
  dplyr::mutate_all(~ifelse(is.nan(.), NA, .)) %>%
  tidyr::pivot_longer(-species) %>%
  dplyr::group_by(species) %>%
  tidyr::fill(value, .direction = "updown") %>%
  tidyr::pivot_wider()
  

write_csv(avgintakefill, paste0(o.name,"intake_stomachContent_NOBA.csv"))

Remaining unused parameters (recruitment, maturity, etc) need to be dimensioned correctly for the number of species and years:

#recruitment dummy params
recpars <- matrix(1, 27, singletons$Nspecies,
                  dimnames = list(c("eggRicker_alpha",
                                    "eggRicker_shape",
                                    "eggRicker_beta",
                                    "DS_alpha",
                                    "DS_shape",
                                    "DS_beta",
                                    "gamma_alpha",
                                    "gamma_shape",
                                    "gamma_beta",
                                    "ricker_alpha",
                                    "ricker_shape",
                                    "ricker_beta",
                                    "BH_alpha",
                                    "BH_shape",
                                    "BH_beta",
                                    "shepherd_alpha",
                                    "shepherd_shape",
                                    "shepherd_beta",
                                    "hockey_alpha",
                                    "hockey_shape",
                                    "hockey_beta",
                                    "segmented_alpha",
                                    "segmented_shape",
                                    "segmented_beta",
                                    "sigma",
                                    "type",
                                    "stochastic"
                  ), c(sort.default(sppsubset$Name))
                  ))

# change rectype input to 9=avg+devs
recpars[c("type"),] <- 9
                  
write.csv(recpars, paste0(o.name,"recruitment_species_NOBA.csv"))

Rcoveff <- data.frame(name = c(sort.default(sppsubset$Name)), speciesEffect1 = 0)

write_csv(Rcoveff, paste0(o.name,"recruitment_covariateEffects_NOBA.csv"))

#the other covariate effects are also set to 0 so just use the same thing
write_csv(Rcoveff, paste0(o.name,"maturity_covariateEffects_NOBA.csv"))
write_csv(Rcoveff, paste0(o.name,"growth_covariateEffects_NOBA.csv"))

fecundity <- matrix(1, 2, singletons$Nspecies,
                   dimnames = list(c("d", "h"), c(sort.default(sppsubset$Name))))

write.csv(fecundity, paste0(o.name,"fecundity_species_NOBA.csv"))

fecunditytheta <- as.data.frame(matrix(0, singletons$Nspecies, singletons$Nsizebins, 
                dimnames = list(c(sort.default(sppsubset$Name)),  c(paste0("sizeclass",seq(1:singletons$Nsizebins)))))) 

write.csv(fecunditytheta, paste0(o.name,"fecundity_Theta_NOBA.csv"))

#these might need to be real; atlantis has a maturity ogive but I need pars
maturity <- matrix(1, 2, singletons$Nspecies,
                   dimnames = list(c("nu", "omega"), c(sort.default(sppsubset$Name))))

write.csv(maturity, paste0(o.name,"maturity_species_NOBA.csv"))

#fishsel dimensions species rows fleets columns, one matrix each c and d, not using?
#fleets <- unique(NOBAom_ms$truecatchage_ss$fleet)
# Apr 2022 lets just have one fleet for testing
#fleets <- "allcombined"

# June 2022 one fleet for cod, one for all others
fleets <- c("allbutcod", "codfleet")

fishseldummy <- matrix(1, singletons$Nspecies, singletons$Nfleets,
                   dimnames = list(c(sort.default(sppsubset$Name)), fleets))

write.csv(fishseldummy, paste0(o.name,"fishing_selectivityc_NOBA.csv"))
write.csv(fishseldummy, paste0(o.name,"fishing_selectivityd_NOBA.csv"))

#discards dummy arrays, set to 0 discards and 0 survival for now
discards <- matrix(0, singletons$Nspecies*singletons$Nfleets, singletons$Nsizebins)

write.csv(discards, paste0(o.name,"fishing_discards_NOBA.csv"))
write.csv(discards, paste0(o.name,"fishing_discardsSurvival_NOBA.csv"))

#can we get away with not changing the sim parameters at the end of the dat file? no

# get true B0 from NOBA
B0 <- NOBAom_ms$truetotbio_ss %>%
  filter(time==0) %>%
  select(species, B0=atoutput)

write_csv(B0, paste0(o.name,"B0_NOBA.csv"))

# properly dimension assessment thresholds and effort scaling

#species column names with rowname additionalThreshold and 0s
spthresh <- matrix(0, 1, singletons$Nspecies, 
                   dimnames = c("additionalThreshold", list(sort.default(sppsubset$Name))))
                   
write.csv(spthresh, paste0(o.name,"assessmentThresholdsSpecies_NOBA.csv"))

#species column names no rowname all 1s
efscale <- as.data.frame(matrix(1, 1, singletons$Nspecies, 
                   dimnames = c("", list(sort.default(sppsubset$Name)))))
write_csv(efscale, paste0(o.name,"observation_effortScaling_NOBA.csv"))

#same formats for residence time (1=efscale) and outside mortality (0=spthresh)
write_csv(efscale, paste0(o.name,"restime_NOBA.csv"))

outmort <- as.data.frame(matrix(0, 1, singletons$Nspecies, 
                  dimnames = c("", list(sort.default(sppsubset$Name)))))
write_csv(outmort, paste0(o.name,"outsidemort_NOBA.csv"))

Create a pin file:

Initial N at true age can be pulled directly from Atlantis, but we need N at length, which only comes from survey sampling. If we use the first survey observation, this includes NA values and also misses some species that leave the model domain. I don’t think this works with the initial values in the .pin file.

An alternative is to use the modeling start time (after Atlantis spinup) but use the timestep within the year that has recruits coming in for each species. Otherwise we miss the smallest fish, but if we sum or average over the year we will double count fish that grow between length bins. The code below calculates lengths for each timestep within the first modeled year, then constructs the initial true N at age matrix using the timestep for each species with the smallest fish (recruitment time). There are still 2 species that we miss because they are apparently smallest when outside the model domain.

We also need recruitment at the timestep it happens to caculate average recruitment for all species. However, the YOY.txt output file simply repeats recruitment in a given year for each timestep so we just select the timestep equivalent to the end of each year to calculate the average and deviations. Log mean recruitment and deviations are hydra pin inputs.

The average recruitment value we will draw from atlantis (vector), and we’ll set the recdevs to 0 to start, but dimension the input as a matrix with year rows and species columns.

#this is true N at age at time 0, but perhaps less relevant given atlantis spinup
# startN <- NOBAom_ms$truenumsage_ss %>%
#   filter(time==0) %>%
#   group_by(species, agecl) %>%
#   summarise(trueN = sum(atoutput)) %>%
#   pivot_wider(names_from = agecl, values_from = trueN)

#True N at model fit start
#run this through atlantisom age2length? no equivalent wt info
# modstartN <- NOBAom_ms$truenumsage_ss %>%
#   filter(time %in% fittimes) %>%
#   filter(time == min(time)) %>%
#   group_by(species, agecl) %>%
#   summarise(trueN = sum(atoutput)) %>%
#   pivot_wider(names_from = agecl, values_from = trueN)

# true N at agecl at model fit start, can run through age2length
# keep first 5 timesteps (1 yr) to filter later for time when each species recruits
modstartN <- NOBAom_ms$truenums_ss %>%
  filter(time %in% fittimes[1:5]) %>%
  # group_by(species) %>%
  # filter(time == min(time)) %>%
  group_by(species, agecl, time) %>%
  summarise(atoutput = sum(atoutput)) %>%
  mutate(polygon = "NA",
         layer = "NA") %>%
  as.data.frame()

modstartresn <- NOBAom_ms$trueresn_ss %>%
  filter(time %in% fittimes[1:5]) %>%
  #group_by(species) %>%
  #filter(time == min(time)) %>%
  group_by(species, agecl, time) %>%
  summarise(atoutput = median(atoutput)) %>%
  mutate(polygon = "NA",
         layer = "NA") %>%
  as.data.frame()

modstartstructn <- NOBAom_ms$truestructn_ss %>%
  filter(time %in% fittimes[1:5]) %>%
  #group_by(species) %>%
  #filter(time == min(time)) %>%
  group_by(species, agecl, time) %>%
  summarise(atoutput = median(atoutput)) %>%
  mutate(polygon = "NA",
         layer = "NA") %>%
  as.data.frame()

# both below from survey configs
lenage_cv <- 0.1

# max size bin for length estimation, function defaults to 150 cm if not supplied
maxbin <- 200

modstartlen <- atlantisom::calc_age2length(structn = modstartstructn,
                                           resn = modstartresn,
                                           nums = modstartN,
                                           biolprm = NOBAom_ms$biol,
                                           fgs = NOBAom_ms$funct.group_ss,
                                           maxbin = maxbin,
                                           CVlenage = lenage_cv,
                                           remove.zeroes=TRUE)

#find the time with the smallest fish and use that for each spp
smallest <- modstartlen$natlength %>%
  group_by(species) %>%
  filter(lower.bins == min(lower.bins)) %>%
  filter(atoutput == max(atoutput)) %>%
  select(species, rectime=time)

#this should be in millions for hydra
# January 2022 change to log(millions)
  
modstartNlen <- modstartlen$natlength %>%
  dplyr::select(species, agecl, time, upper.bins, atoutput) %>%
  dplyr::left_join(smallest) %>%
  dplyr::filter(time == rectime) %>%
  dplyr::select(-time, -rectime) %>%
  dplyr::left_join(modbins) %>%
  dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
  dplyr::group_by(species, sizebin) %>%
  dplyr::summarise(sumlen = log(sum(atoutput)/1000000)) %>%
  tidyr::spread(sizebin, sumlen) %>%
  ungroup() %>%
  replace(is.na(.),0) %>% # use 0 for missing value in pin file
  select(-species) %>%
  as.matrix()

rownames(modstartNlen) = sort.default(sppsubset$Name)

# #alternatively, use the first set of survey lengths
# surveylen <- read_savedsurvs(d.name, "survLen")
# 
# #find earliest survey observation across all by time
# #keeps survey name in column_label if we need it
# #this hardcodes using the first replicate if there are replicate survey samples
# startNlen <- pmap(surveylen, bind_rows, .id="column_label")[[1]] %>%
#   dplyr::filter(time %in% fittimes) %>%
#   dplyr::group_by(species) %>% #not all species at all times
#   dplyr::filter(time == min(time)) %>% #earliest data for each species
#   dplyr::select(species, agecl, upper.bins, atoutput) %>%
#   dplyr::left_join(modbins) %>%
#   dplyr::filter(modbin.min < upper.bins & upper.bins <= modbin.max) %>%
#   dplyr::group_by(species, sizebin) %>%
#   dplyr::summarise(sumlen = sum(atoutput)) %>%
#   tidyr::spread(sizebin, sumlen) %>%
#   ungroup()

write.csv(modstartNlen, paste0(o.name,"observation_Y1N_NOBA.csv"))

#start with true recruitment, find single input for the year, average over years

# rectiming is a dataframe with species code, time_spawn (day of year),
# but I don't think we need it
  # # recruit_time (number of days), and recruit period (number of days).
  # # we use these to calculate recstart (day of year) and recend (day of year)
  # rectiming <- merge(NOBAom_ms$biol$time_spawn, NOBAom_ms$biol$recruit_time, by = 1)
  # rectiming <- merge(rectiming, NOBAom_ms$biol$recruit_period, by = 1)
  # names(rectiming) <- c("Code", "time_spawn", "recruit_time", "recruit_period")
  # rectiming <- rectiming %>%
  #   dplyr::mutate(recstart = time_spawn + recruit_time) %>%
  #   dplyr::mutate(recend = recstart + recruit_period)
  # 
  # #subset recpars to groups of interest
  # recstart_temp <- rectiming[rectiming$Code %in% NOBAom_ms$code_ss,]
  
  nitro <- merge(NOBAom_ms$biol$kwsr, NOBAom_ms$biol$kwrr, by = "1")
  names(nitro) <- c("Code", "KWSR", "KWRR")
  
  nitro <- nitro %>%
    group_by(Code) %>%
    mutate(Nsum = sum(KWSR, KWRR))

  truerec <- NOBAom_ms$YOY_ss %>% 
    pivot_longer(-Time, names_to = "Code", values_to = "recwt") %>%
    mutate(Code = gsub("\\.0", "", Code)) %>%
    filter(Time>0) %>% # this output not meaningful
    filter(Time %in% seq(365, max(Time), by=365)) %>%
    left_join(NOBAom_ms$funct.group_ss[, c("Code", "Name")]) %>%
    left_join(nitro) %>%
    mutate(recnums = (recwt * 50000000.0 / NOBAom_ms$biol$redfieldcn) / Nsum) %>%
    group_by(Name) %>%
    mutate(meanrecnums = mean(recnums))
  
  trueage1 <- NOBAom_ms$truenumsage_ss %>%
    filter(agecl==1) %>%
    group_by(species, agecl, time) %>%
    summarise(age1N = sum(atoutput)) %>%
    ungroup() %>%
    mutate(year = ceiling(time/stepperyr)) %>%
    group_by(species, agecl, year) %>%
    filter(age1N == max(age1N)) %>%
    select(Name=species, Time=time, age1N, year) %>%
    group_by(Name) %>%
    mutate(meanage1N = mean(age1N))
    
  #check wither truerec and trueage1 have similar patterns
  ggplot(truerec, aes(x=Time, y=recnums/meanrecnums)) + 
    geom_line() + 
    geom_point(data=trueage1, aes(x=Time*73, y=age1N/meanage1N)) +
    facet_wrap(~Name, scales = "free_y")
  
  #model input log(millions)
  avgrec <- truerec %>%
    filter(Time %in% fittimes.days) %>%
    group_by(Name) %>%
    summarize(AvgRec = mean(log(recnums/1000000))) %>%
    ungroup() 
  
  #sub these in if above is -Inf, scale will be off but starting value
  avgage1N <- trueage1 %>%
    filter(Time %in% fittimes) %>%
    group_by(Name) %>%
    summarize(Avgage1N = mean(log(age1N/1000000))) %>%
    ungroup() 

  avgrec1 <- merge(avgrec, avgage1N) %>%
    filter(is.infinite(AvgRec)) %>%
    select(Name, AvgRec=Avgage1N)

  if(length(avgrec1)>0){
    avgrec <- avgrec %>%
      filter(is.finite(AvgRec)) %>%
      full_join(avgrec1) %>%
      arrange(Name)
  }
  
  avgrec <- as.data.frame(avgrec  %>%
                            pivot_wider(names_from = Name, values_from = AvgRec), 
                          row.names = "AvgRec")
    
write.csv(avgrec, paste0(o.name,"AvgRecPinData_NOBA.csv"))

recdev <- truerec %>%
  filter(Time %in% fittimes.days) %>%
  mutate(logrec = log(recnums/1000000)) %>%
  group_by(Name) %>%
  mutate(meanrec = mean(logrec),
         recdevs = logrec-meanrec) %>%
  ungroup() %>%
  dplyr::mutate(year = Time/365) %>%
  dplyr::select(species=Name, year, recdevs) %>%
  tidyr::pivot_wider(names_from = species, values_from = recdevs, names_sort = TRUE)%>%
  replace(is.na(.),0) #for species with missing rec in YOY.txt
  
#visual check only works up to ungroup statement
#ggplot(recdev, aes(x=Time, y=recdevs)) + geom_line() + facet_wrap(~Name)

write_csv(recdev, paste0(o.name,"RecDevsPinData_NOBA.csv"))

# matrix q 1.00E-30 for species not caught in gear
# species rows gears columns
#fleets <- unique(NOBAom_ms$truecatchage_ss$fleet)

# Apr 2022 lets just have one fleet for testing
# see https://github.com/thefaylab/hydra_sim/issues/16
fleets <- "allcombined"

#which fisheries catch witch species?
catchperfishery <- read.table(here("SkillAssessment/atlantisoutput/NOBA_sacc_38/nordic_runresults_01CatchPerFishery.txt"), header = TRUE) 

fishingq <- catchperfishery %>%
  pivot_longer(c(-Time, -Fishery), names_to = "species") %>%
  filter(value>0) %>%
  group_by(Fishery, species) %>%
  summarise(totalcatch = sum(value)) %>%
  left_join(NOBAom_ms$funct.group_ss, by = c("species"="Code")) %>%
  select(Fishery, Name, totalcatch) %>%
  mutate(totalcatch = 1) %>%
  filter(!is.na(Name)) %>%
  arrange(match(Fishery, fleets)) %>%
  pivot_wider(names_from = Fishery, values_from = totalcatch) %>%
  add_row(Name = c("Long_rough_dab", "Polar_cod")) %>%
  arrange(Name) %>%
  replace(is.na(.),0)

# reformat for one combined fleet
fishingq <- fishingq %>%
  group_by(Name) %>%
  mutate(allcombined = sum(across())) %>%
  select(Name, allcombined)

fishingq$allcombined[fishingq$allcombined > 1] <- 1

# June 2022 add a separate fleet for cod--hack but oh well
fishingq <- fishingq %>%
  dplyr::mutate(allbutcod = case_when(Name=="North_atl_cod" ~ 0,
                               TRUE ~ allcombined),
                codfleet = case_when(Name=="North_atl_cod" ~ 2,
                               TRUE ~ 0),
                qind = sum(allbutcod, codfleet)) %>%
  dplyr::select(Name, qind)


write_csv(fishingq, paste0(o.name,"fishing_q_indicator_NOBA.csv")) #indicator in .dat file

# actual fishing q  April 2022


# from Gavin, additions to .pin
# Fishing mortality
# Average F by fleet and area
# init_matrix avg_F(1,Nareas,1,Nfleets,avg_F_phase)  -1.6

# starting value sum all catch for all species/sum all B for all species
# take mean and annual devs of that




avgF <- 

# Annual F deviations
#init_3darray F_devs(1,Nareas,1,Nfleets,1,Nyrs,dev_F_phase)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Fishery qs
# init_vector ln_fishery_q(1,Nqpars) (= sum of fishery_indicator_q - Nfleet*Narea)

initfishq <- matrix(0, )
0 0 0 0 0 0 0 0


# matrix 0 species rows gears columns  
fishingerr <- matrix(0, singletons$Nspecies, singletons$Nfleets,
                   dimnames = list(c(sort.default(sppsubset$Name)), fleets))

write.csv(fishingerr, paste0(o.name,"fishing_error_NOBA.csv"))

# matrix species columns rows survey q and obs error
surveyinf <- matrix(c(1,0), 2, singletons$Nspecies,
                   dimnames = list(c("q", "obs_error"), c(sort.default(sppsubset$Name))))

  
write.csv(surveyinf, paste0(o.name,"survey_info_NOBA.csv"))

# will need to add fishery and survey selectivity pars to pin 

Write metadata on the surveys:

#write readme for metadata
meta <- c("Simulated data from Norweigan-Barents Atlantis Model by Hansen et al. \n\n",
          paste("Atlantis outputs stored in", d.name, '\n\n'),
          paste("Biomass output for surveys", names(survObsBiom), '\n\n'),
          paste("Fishery catch output for fisheries", names(catchbio_ss), '\n\n'),
          "Biomass and catch output units are annual tons.\n\n",
          "Years are indexed 1:Nyrs in files, corresponding to Atlantis years:\n",
          paste(fitstartyr, "to", fitendyr, '\n\n'),
          "Survey specifications:\n",
          "----------------------\n",
          read_file(here("SkillAssessment/config/mssurvey_fall.R")),
          "\n",
          "----------------------\n",
          read_file(here("SkillAssessment/config/mssurvey_spring.R")),
          "\n",
          "\n\n",
          "Fishery specifications:\n",
          "----------------------\n",
          read_file(here("SkillAssessment/config/msfishery.R")),
          "\n"
          )

cat(as.character(meta), file=paste0(here("SkillAssessment/msinput/hydra/"),"README.txt"))

Use hydradata package to create the .dat and .pin files. We are working from the forked hydradata repository at https://github.com/thefaylab/hydradata for this project. All of the input .csv files created here have been copied to the data-raw folder on thefaylab/hydradata repo. The create_RData_NOBA.R file was updated to read these inputs, sourced, and create_RData() was run to make a new hydraDataList.rda file in the package. Then the package is built locally, R is restarted, and the following code block is run to produce input files.

library(hydradata)
inputs <- setup_default_inputs()
inputs$outDir <- here("SkillAssessment/msinput/hydra")
inputs$outputFilename <- "hydra_sim_NOBA" 



hydraDataList <- create_datpin_files(inputs,hydraDataList)

References

1.
Hansen C, Skern-Mauritzen M, Meeren G van der, Jähkel A, Drinkwater KF. Set-up of the Nordic and Barents Seas (NoBa) Atlantis model [Internet]. Havforskningsinstituttet; 2016. Available: https://imr.brage.unit.no/imr-xmlui/bitstream/handle/11250/2408609/FoH_2-2016.pdf?sequence=1&isAllowed=y
2.
Hansen C, Drinkwater KF, Jähkel A, Fulton EA, Gorton R, Skern-Mauritzen M. Sensitivity of the Norwegian and Barents Sea Atlantis end-to-end ecosystem model to parameter perturbations of key species. Tsikliras AC, editor. PLOS ONE. 2019;14: e0210419. doi:10.1371/journal.pone.0210419