fasstr
, the Flow
Analysis Summary Statistics Tool for R, is a set of R functions to tidy, summarize,
analyze, trend, and visualize streamflow data. This package summarizes
continuous daily mean streamflow data into various daily, monthly,
annual, and long-term statistics, completes trending and frequency
analyses, with outputs in both table and plot formats.
This package provides functions for streamflow data analysis, including:
add_*
and
fill_*
functions),screen_*
functions),calc_*
functions),compute_*
functions), and,plot_*
functions).Useful features of functions include:
tidyhydat
package to pull
streamflow data from a Water Survey of Canada HYDAT
database for analyses;This package is maintained by the Water Protection and Sustainability Branch of the British Columbia Land, Water and Resource Stewardship.
You can install fasstr
directly from CRAN:
To install the development version from GitHub, use the remotes
package then the fasstr
package:
if(!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("bcgov/fasstr")
Several other packages will be installed with fasstr
.
These include tidyhydat
for downloading Water Survey of Canada hydrometric data, zyp
for
trending, ggplot2
for creating plots, and tidyr
and dplyr
for data wrangling and summarizing, amongst others.
To use the station_number
argument and pull data
directly from a Water
Survey of Canada HYDAT database into fasstr
functions,
download a HYDAT file using the following code:
There are several vignettes to provide more information on the usage
of fasstr
functions and how to customize them with various
argument options.
All functions in fasstr
require a daily mean streamflow
data set from one or more hydrometric stations. Long-term and continuous
data sets are preferred for most analyses, but seasonal and partial data
can be used. Other daily time series data, like temperature,
precipitation or water levels, may also be used, but with certain
caution as some calculations/conversions are based on units of
streamflow (cubic metres per second). Data is provided to each function
using the either the data
argument as a data frame of flow
values, or the station_number
argument as a list of Water
Survey of Canada HYDAT station numbers.
When using the data
option, a data frame of daily data
containing columns of dates (YYYY-MM-DD in date format), values (mean
daily discharge in cubic metres per second in numeric format), and,
optionally, grouping identifiers (character string of station names or
numbers) is called. By default the functions will look for columns
identified as ‘Date’, ‘Value’, and ‘STATION_NUMBER’, respectively, to be
compatible with the ‘tidyhydat’ defaults, but columns of different names
can be identified using the dates
, values
,
groups
column arguments (ex.
values = Yield_mm
). The following is an example of an
appropriate data frame (STATION_NUMBER not required):
Alternatively, you can directly pull a flow data set directly from a
HYDAT database (if installed) by providing a list of station numbers in
the station_number
argument (ex.
station_number = "08NM116"
or
station_number = c("08NM116", "08NM242")
) while leaving the
data arguments blank. A data frame of daily streamflow data for all
stations listed will be extracted using tidyhydat
and then
fasstr
calculations will produce results of the
functions.
This package allows for multiple stations (or other groupings) to be
analyzed in many of the functions provided identifiers are provided
using the groups
column argument (defaults to
STATION_NUMBER). If grouping column doesn’t exist or is improperly
named, then all values listed in the values
column will be
summarized.
These functions, start with either add_*
or
fill_*
, add columns and rows, respectively, to streamflow
data frames to help set up your data for further analysis. Examples
include adding rolling means, adding date variables (WaterYear, Month,
DayofYear, etc.), adding basin areas, adding columns of volumetric
discharge and water yield, and filling dates with missing flow values
with NA
.
The analysis functions summarize your discharge values into various
statistics. screen_*
functions summarize annual data for
outliers and missing dates. calc_*
functions calculate
daily, monthly, annual, and long-term statistics (e.g. mean, median,
maximum, minimum, percentiles, amongst others) of daily, rolling days,
and cumulative flow data. compute_*
functions also analyze
data but produce more in-depth analyses, like frequency and trending
analysis, and may produce multiple plots and tables as a result. All
tables are in tibble data frame formats. Can use
write_flow_data()
or write_results()
to
customize saving tibbles to a local drive.
The visualization functions, which begin with plot_*
,
plot the various summary statistics and analyses as a way to visualize
the data. While most plotting function statistics can be customized,
some come pre-set with statistics that cannot be changed. Plots can be
further modified by the user using the ggplot2
package and
its functions. All plots functions produce lists of plots (even if just
one produced). Can use write_plots()
to customize saving
the lists of plots to a local drive (within folders or PDF
documents).
If certain n-day rolling mean statistics are desired to be analyzed
(e.g. 3- or 7-day rolling means) some functions provide the ability to
select for that as function arguments
(e.g. rolling_days = 7
and
rolling_align = "right"
). The rolling day align is the
placement of the date amongst the n-day means, where “right” averages
the day-of and previous n-1 days, “centre” date is in the middle of the
averages, and “left” averages the day-of and the following n-1 days. For
your own analyses you can add rolling means to your data set using the
add_rolling_means()
function.
To customize your analyses for specific time periods, you can
designate the start and end years of your analysis using the
start_year
and end_year
arguments and remove
any unwanted years (for partial data sets for example) by listing them
in the excluded_years
argument
(e.g. excluded_years = c(1990, 1992:1994)
). Alternatively,
some functions have an argument called complete_years
that
summarizes data from just those years which have complete flow records.
Some functions will also allow you to select the months of a year to
analyze, using the months
argument, as opposed to all
months (if you want just summer low-flows, for example). Leaving these
arguments blank will result in the summary/analysis of all years and
months of the provided data set.
To group analyses by water, or hydrologic, years instead of calendar
years, if desired, you can set water_year_start
within most
functions to another month than 1 (for January). A water year can be
defined as a 12-month period that comprises a complete hydrologic cycle
(wet seasons can typically cross calendar year), typically starting with
the month with minimum flows (the start of a new water recharge cycle).
If another start month is desired, you can choose it using the
water_year_start
argument (numeric month). The water year
identifier is designated by the year it ends in (e.g. a water year from
Oct 1, 1999 to Sep 30, 2000 is designated as 2000). Start, end and
excluded years will be based on the specified water year.
For your own analyses, you can add date variables to your data set
using the add_date_variables()
or
add_seasons()
functions.
Water yield statistics (in millimetres) calculated in the some of the
functions require an upstream drainage basin area (in sq. km) using the
basin_area
argument. If no basin areas are supplied, all
yield results will be NA
. To apply a basin area (10 sqkm
for example) to all daily observations, set the argument as
basin_area = 10
. If there are multiple stations or groups
to apply multiple basin areas (using the groups
argument),
set them individually using this option:
basin_area = c("08NM116" = 795, "08NM242" = 22)
. If a
STATION_NUMBER column exists with HYDAT station numbers, the function
will automatically use the basin areas provided in HYDAT, if available,
so basin_area
is not required. For your own analyses, you
can add basin areas to your data set using the
add_basin_area()
function.
With the use of the ignore_missing
argument in most
functions, you can decide how to handle dates with missing flow values
in calculations. When you set ignore_missing = TRUE
a
statistic will be calculated for a given year, all years, or month
regardless of if there are missing flow values. When
ignore_missing = FALSE
the returned value for the period
will be NA
if there are missing values. To allow some
missing dates and still calculate statistics, some functions also
including the allowed_missing
argument where you provide a
percentage (0 to 100) of missing days per time period.
Some functions have an argument called complete_years
which can be used, when set to TRUE
, to filter out years
that have partial data sets (for seasonal or other reasons) and only
years with full data are used to calculate statistics.
To determine the long-term summary statistics of daily data for each
month (mean, median, maximum, minimum, and some percentiles) you can use
the calc_longterm_daily_stats()
function. If the ‘Mission
Creek near East Kelowna’ hydrometric station is of interest you can list
the station number in the station_number
argument to obtain
the data (if tidyhydat
and HYDAT are installed). Statistics
over several months can also be calculated, if of interest. See the
summer statistics (from July to September) in this example.
To visualize the daily streamflow patterns on an annual basis, the
plot_daily_stats()
function will plot out various summary
statistics for each day of the year. Data can also be filtered for
certain years of interest (a 1981-2010 normals period for this example)
using the start_year
and end_year
arguments.
We can also compare individual years against the statistics using
add_year
argument like below.
Flow duration curves can be produced using the
plot_flow_duration()
function.
This package also provides a function,
compute_annual_frequencies()
, to complete a volume
frequency analysis by fitting annual minimums or maximums to Log-Pearson
Type III or Weibull probability distributions. See the volume frequency
analyses documentation for more information. For this example, the 7-day
low-flow quantiles are calculated for the Mission Creek hydrometric
station using the Log-Pearson Type III distribution and method of
moments fitting method (both default). With this, several low-flow
indicators can be determined (i.e. 7Q5, 7Q10).
freq_results <- compute_annual_frequencies(station_number = "08NM116",
start_year = 1981,
end_year = 2010,
roll_days = 7,
fit_distr = "PIII",
fit_distr_method = "MOM")
freq_results$Freq_Fitted_Quantiles
The probability of observed extreme events can also be plotted (using selected plotting position) along with the computed quantiles curve for comparison.