The {nert} package provides straightforward and seamless access to various datasets available on the TERN Data Portal. Included among these are the Soil Moisture Integration and Prediction System (SMIPS) datasets, which include soil moisture estimates that can be used in your data analytics pipelines.
This document showcases this process with the example of a synthesised multi-site grain production experiment. The {nert} package is loaded and used to download soil moisture time series data at various locations across South Australia. Recognising soil moisture as a classical confounder—affecting crop yield (via root-zone water availability) and nitrogen treatment (via increased volatilisation in moist soils)—we will augment our modelling of the grain production to use soil moisture estimates over the treatment period for estimating the nitrogen treatment effect.
A simulated dataset containing yield data for a fictitious grain
production experiment has been included with the {nert} package. The
dataset is called grain, and you can load the package and
this dataset with:
library(nert)
data(grain)
str(grain)
#> 'data.frame': 2880 obs. of 10 variables:
#> $ Site : Factor w/ 10 levels "Adelaide","Barossa Valley",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ Latitude : num -34.9 -34.9 -34.9 -34.9 -34.9 ...
#> $ Longitude : num 139 139 139 139 139 ...
#> $ SowDate : Date, format: "2022-05-20" "2022-05-20" "2022-05-20" "2022-05-20" ...
#> $ NitrogenDate : Date, format: "2022-06-04" "2022-06-04" "2022-06-04" "2022-06-04" ...
#> $ Rep : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#> $ Variety : Factor w/ 8 levels "Variety_A","Variety_B",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ Nitrogen_kgNha : Factor w/ 4 levels "0","30","60",..: 1 1 1 2 2 2 3 3 3 4 ...
#> $ SeedRate_plantsm2: Factor w/ 3 levels "75","150","300": 1 2 3 1 2 3 1 2 3 1 ...
#> $ Yield_Tha : num 3 3.69 3.89 2.76 3.27 ...The grain dataset supposes that a hypothetical grain
production experiment has been run at a selection of ten sites across
South Australia, investigating the effects of four different treatment
levels of nitrogen application and three different seeding rates on the
grain yield for some crop with eight varieties.
Note that we have locational coordinates (in Latitude
and Longitude) for each of the sites, as well as dates for
the crop sowing and nitrogen applications (in SowDate and
NitrogenDate respectively.) This spatio-temporal
information is useful, as it allows us to match up the experimental
sites with the data points that we need to download from SMIPS.
Here we use the {nert} package to download SMIPS soil moisture
datasets from the TERN Data Portal, which we can use to augment our
analysis of the grain experiment data. Two key SMIPS
datasets of interest are:
totalbucket: an estimate of the volumetric soil
moisture (in units of mm) from the entire SMIPS 0-90cm soil
moisture store,
SMindex: the SMIPS soil moisture index
(i.e., a unitless index between 0 and 1 that approximates how full the
SMIPS bucket moisture store is relative to its full 90cm
capacity).
For simplicity, suppose we are interested in the SMIPS soil moisture
index (SMindex) data, for all days falling between the
earliest nitrogen application date up to 30 days after the last
application date, and we want the soil moisture index at each site (by
longitude/latitude coordinates). We can generate this date range in a
straightforward way using the NitrogenDate column of the
grain dataset as follows:
start_date <- min(grain$NitrogenDate)
end_date <- max(grain$NitrogenDate) + 30
date_range <- seq(start_date, end_date, by = "1 day")
c(date_range[1], date_range[length(date_range)])
#> [1] "2022-05-07" "2022-07-15"We also need the longitude and latitude for the sites, which we can
readily retrieve from the grain dataset columns
Longitude and Latitude:
sites <- unique(grain[, c("Site", "Longitude", "Latitude")])
sites
#> Site Longitude Latitude
#> 1 Adelaide 138.6244 -34.94773
#> 289 Barossa Valley 138.9580 -34.54843
#> 577 Clare Valley 138.6012 -33.83754
#> 865 Eyre Peninsula 135.7967 -34.37378
#> 1153 Fleurieu Peninsula 138.4608 -35.52307
#> 1441 Kangaroo Island 137.0736 -35.71041
#> 1729 Limestone Coast 140.6388 -37.88957
#> 2017 Sturt Vale 140.0723 -33.34666
#> 2305 Murraylands 139.3014 -35.12425
#> 2593 Yorke Peninsula 137.2090 -35.05124Now TERN supplies the SMIPS daily data rasters as cloud-optimised GeoTIFFs (or COGs), which contain the soil moisture point predictions across the entirety of Australia. As these are cloud-optimised, only spatial location data that is actually used in our analysis will be downloaded, which gives us great efficiency in space/time cost. The {nert} package works in tandem with the {terra} package to achieve this efficiency:
nert::read_smips(),terra::extract().This gives us a quicker and tighter data download, as we only stream
those bytes from the TERN server that are actually necessary for our
analysis. The below code shows this process. (Note that
terra::extract() is particular about the order of the
longitude/latitude coordinates: longitude should always be specified
first, followed by latitude.)
smips_data <- data.frame()
for (i in seq_along(date_range)) {
r <- read_smips(collection = "SMindex", date = date_range[i])
smips_points <- terra::extract(
x = r,
y = data.frame(lon = sites$Longitude, lat = sites$Latitude),
xy = TRUE
)
names(smips_points)[2] <- "smips_smi"
smips_data <- rbind(
smips_data,
data.frame(
Date = date_range[i],
Longitude = sites$Longitude,
Latitude = sites$Latitude,
smips_smi = smips_points$smips_smi
)
)
}
head(smips_data)
#> Date Longitude Latitude smips_smi
#> 1 2022-05-07 138.6244 -34.94773 0.25906488
#> 2 2022-05-07 138.9580 -34.54843 0.17677850
#> 3 2022-05-07 138.6012 -33.83754 0.07674548
#> 4 2022-05-07 135.7967 -34.37378 0.53408158
#> 5 2022-05-07 138.4608 -35.52307 0.37496096
#> 6 2022-05-07 137.0736 -35.71041 0.30223876We can add the Site column to the
smips_data to make it easier to use it in conjunction with
the grain dataset during the analysis:
smips_data$Site <- sites$Site
head(smips_data)
#> Date Longitude Latitude smips_smi Site
#> 1 2022-05-07 138.6244 -34.94773 0.25906488 Adelaide
#> 2 2022-05-07 138.9580 -34.54843 0.17677850 Barossa Valley
#> 3 2022-05-07 138.6012 -33.83754 0.07674548 Clare Valley
#> 4 2022-05-07 135.7967 -34.37378 0.53408158 Eyre Peninsula
#> 5 2022-05-07 138.4608 -35.52307 0.37496096 Fleurieu Peninsula
#> 6 2022-05-07 137.0736 -35.71041 0.30223876 Kangaroo IslandWe are now ready to proceed with the analytics.
First, we model the grain yield with a simple model without taking into account any soil moisture confounding—that is, without reference to the SMIPS data that we have just downloaded. We will later incorporate the SMIPS data to see how including the soil moisture as a covariate improves the nitrogen treatment effect size estimates.
Here we will use the {nlme} package to fit a linear mixed-effect
model. The grain yield Yield_Tha will be modelled taking
the Variety, nitrogen application rate
Nitrogen_kgNha and seeding rate
SeedRate_plantsm2 as fixed terms, and incorporating the
Site and replicate (Rep) structure of the
experiment in a random effect term.
library(nlme)
simple_model <- lme(
fixed = Yield_Tha ~ Variety * Nitrogen_kgNha * SeedRate_plantsm2,
random = ~ 1 | Site / Rep,
data = grain
)We can check the confidence intervals for the effect sizes of the nitrogen treatments for this simple model:
simple_model.ints <- intervals(simple_model, which = "fixed")$fixed
simple_model.Neffects <- simple_model.ints[paste0("Nitrogen_kgNha", c(30, 60, 90)), ]
simple_model.Neffects
#> lower est. upper
#> Nitrogen_kgNha30 -0.17355028 0.0392950 0.2521403
#> Nitrogen_kgNha60 0.08045472 0.2933000 0.5061453
#> Nitrogen_kgNha90 0.09779306 0.3106383 0.5234836Next, we augment our modelling by including the SMIPS soil moisture
data as a covariate (perhaps anticipating that the soil moisture
improves the yield, but might reduce the effect of nitrogen applied to
the soil due to volatilisation). To keep things simple, for each site
(and its associated nitrogen application date) we take an average of the
SMIPS-reported soil moisture index from the date of the nitrogen
application until 30 days after the application, which we will store in
a new column called SoilMoisture_avg.
for (site in sites$Site) {
start_date <- unique(grain[which(grain$Site == site), "NitrogenDate"])[1]
dates <- seq(start_date, start_date + 30, by = "1 day")
smips <- smips_data[which(smips_data$Site == site & smips_data$Date %in% dates), ]
smips_avg <- mean(smips[["smips_smi"]])
grain[which(grain$Site == site), "SoilMoisture_avg"] <- smips_avg
}We can then add this average soil moisture as a fixed effect to the linear mixed-effect model:
augmented_model <- lme(
fixed = Yield_Tha ~ Variety * Nitrogen_kgNha * SeedRate_plantsm2
+ SoilMoisture_avg + SoilMoisture_avg:Nitrogen_kgNha,
random = ~ 1 | Site / Rep,
data = grain
)The confidence intervals for the effect sizes of the nitrogen application treatments for this augmented model are then as follows:
augmented_model.ints <- intervals(augmented_model, which = "fixed")$fixed
augmented_model.Neffects <- augmented_model.ints[paste0("Nitrogen_kgNha", c(30, 60, 90)), ]
augmented_model.Neffects
#> lower est. upper
#> Nitrogen_kgNha30 -0.03984861 0.2000454 0.4399395
#> Nitrogen_kgNha60 0.31409573 0.5539898 0.7938838
#> Nitrogen_kgNha90 0.33961765 0.5795117 0.8194058We can use {ggplot2} plotting to graph the confidence intervals for the simple model versus the augmented model, and illuminate the difference attained when we include the soil moisture as a confounding term in our modelling:
library(ggplot2)
ci <- rbind(
data.frame(simple_model.Neffects, model = "Simple Model"),
data.frame(augmented_model.Neffects, model = "SMIPS-Augmented Model")
)
ci$term <- paste0("Nitrogen_kgNha", c(30, 60, 90))
pd <- position_dodge(-0.4)
ggplot(ci, aes(x = est., y = term, colour = model)) +
scale_y_discrete(limits = rev) +
geom_point(position = pd) +
geom_errorbarh(aes(xmin = lower, xmax = upper), position = pd, height = 0.2) +
labs(y = "predictor", x = "estimate")Nitrogen effect estimates compared between the simple model and the SMIPS-augmented model.
From the comparison of the effect sizes for the nitrogen treatment, we can see that ignoring the effect of soil moisture in the simple model underestimates the direct nitrogen treatment effect. The augmented model, in contrast, accounts for the soil moisture and the potential nitrogen volatilisation that may occur in higher soil moisture conditions.
collect_tern_data()Note that we have previously used loops to collect the SMIPS data
across a range of dates and spatial locations. However as of {nert}
version 0.0.3.9000, the collect_tern_data() function has
been made available, which streamlines this typical data aggregation
task. Collecting the same SMindex time series across the
different trial locations can now be achieved with a single call:
smips_dt <- collect_tern_data(
xy = data.frame(lon = sites$Longitude, lat = sites$Latitude),
date_range = c(min(grain$NitrogenDate), max(grain$NitrogenDate) + 30),
datasets = "SMIPS",
smips_collection = "SMindex",
verbose = FALSE
)
head(smips_dt)
#> date lon lat SMIPS_SMindex
#> <Date> <num> <num> <num>
#> 1: 2022-05-07 138.6244 -34.94773 0.25906488
#> 2: 2022-05-07 138.9580 -34.54843 0.17677850
#> 3: 2022-05-07 138.6012 -33.83754 0.07674548
#> 4: 2022-05-07 135.7967 -34.37378 0.53408158
#> 5: 2022-05-07 138.4608 -35.52307 0.37496096
#> 6: 2022-05-07 137.0736 -35.71041 0.30223876