Introduction

Knowledge of predator diets provides numerous insights into their ecology. Diet estimation therefore remains an active area of research in quantitative ecology. Quantitative Fatty Acid Signature Analysis (QFASA) is a method of estimating the diet composition of predators that was introduced by Iverson et al. (2004).

The fundamental unit of information in QFASA is a fatty acid signature (signature), which is a vector of proportions describing the fatty acid mass composition of lipids. For diet estimation, signatures from one or more predators and from samples of all types of prey potentially consumed by the predators are required. Calibration coefficients, which adjust for the differential metabolism of individual fatty acids by predators, are also required. Given those data inputs, a predator signature is modeled as a linear mixture of the mean prey-type signatures and diet composition is estimated as the specific mixture that minimizes a measure of distance between the observed and modeled predator signatures.

The package qfasar originated as a byproduct of exploration into the performance of QFASA, investigation of alternative modeling and estimation techniques, and the desire to share computational capacity among collaborators participating in that work. qfasar implements a variety of options for the estimation of predator diets in typical QFASA applications, including some new diagnostic measures, as well as methods to explore and potentially improve the performance of a library of prey signatures. In addition, qfasar is designed to facilitate additional research into improved estimation methods through implementation of various bootstrapping and simulation techniques. Several methods that have not previously been published have been included to facilitate investigation of their performance.

Because qfasar has been intentionally designed to support future research into QFASA methodology, the workflow that may be necessary is impossible to predict. Consequently, the user may need to do a little more work and assume greater responsibility for code development than is the case with some R packages that implement analyses with a more predictable workflow. Specific tasks and computations have been encapsulated into functions, but the user needs to know which functions to call and in what order. In other words, the user needs to be familiar with the breadth of functionality available. However, general guidance is provided for users whose only need is to estimate predator diet composition (see Getting started).

Getting started

Users whose objectives involve basic diet estimation need to know how to get their data organized for an analysis, understand the options for diet estimation that are available in qfasar, and ultimately estimate diet composition. That information is summarized in the following sections:

Data frames

Preparing for an analysis

Distance measures

Estimation spaces

Diet estimation

Users that wish to evaluate the performance of a prey library, explore a prey library for hidden structure to potentially improve diet estimates, or use diagnostic techniques that might provide insights into the suitability of diet estimates or potential estimation problems should also read the section Diagnostic functionality.

Users whose objectives involve simulation-based research should read the section Simulation functionality for an overview of the tools implemented in qfasar.

Getting help within R

Once qfasar has been installed, it can be attached and made available for use by typing the command library(qfasar).

To view a list of the contents of qfasar, type library(help=qfasar).

To view the documentation of a function, which may contain more technical information than is included in this vignette, type ?function_name.

Distribution list

To be notified when new versions of qfasar are available, e-mail a request to be added to the qfasar distribution list to [email protected]. Please include your name and affiliation in the request.

Data frames

Three types of data are potentially necessary when using qfasar. The user will need to use an appropriate base R function, such as read.csv(), to read data into data frames, possibly modify the data frames to get them in the required formats, and pass the data frames as arguments to qfasar functions. Note that qfasar has strict formatting requirements for the data frames, which are described in the following subsections.

Prey signature data
Predator signature data
Fatty acid suite data

Users may find it most convenient to store data in comma separated variable (csv) ASCII files in formats similar or identical to the data frames.

Prey signature data

The prey signature data frame contains information on prey types, unique sample identifiers, and fatty acid signature proportions or percentages. The formatting requirements for this file are:

The data frame must be in row-major format, i.e., each row contains information for an individual prey animal.
The first column must contain an alphanumeric designation of prey type. This will generally be a species of prey, or possibly subclasses of a species, e.g., large versus small or subadult versus adult.
The second column must contain an alphanumeric identifier unique to each prey animal, i.e., a sample ID. The ID should not be a number only, so that R recognizes the ID as a factor. Alternatively, one could force R to treat a numeric ID as a factor when the data are read.
The remaining columns must contain signature data. The data can be in the form of either proportions or percentages, as qfasar will internally convert the data to proportions for analysis.
The data frame must have column names, such as “type”, “id”, name of fatty acid 1, name of fatty acid 2, etc. These can most easily be included by having the names in the first row of the data file and using the header=TRUE option when reading the data into R. Fatty acid names in the prey data frame must exactly match the names in the predator and fatty acid suite data frames (described below).
NOTE: The data frame should contain data from all available fatty acids, not just a subset. The fatty acids to be used in an analysis are designated in the fatty acid suite data frame (described below).

As an example, the first few rows and columns of a prey signature data file used by Bromaghin et al. (2015), in comma separated variable (csv) format, follow. Note that extra spaces have been added between commas and the subsequent values to enhance readability here, but extra spaces should not be included in the data file.

Species,            ID,   c12.0,  c13.0,  Iso14,  c14.0,  c14.1w9,  c14.1w7,  c14.1w5
Atlantic argentine, E68,  0.17,   0.05,   0.02,   6.17,   0.21,     0.03,     0.07
Atlantic argentine, E69,  0.11,   0.05,   0.03,   6.37,   0.27,     0.03,     0.06
Atlantic argentine, E70,  0.14,   0.05,   0.03,   6.64,   0.26,     0.03,     0.06
Atlantic argentine, E71,  0.15,   0.05,   0.03,   7.16,   0.28,     0.02,     0.06
Atlantic argentine, E72,  0.13,   0.05,   0.02,   6.65,   0.29,     0.02,     0.05
Atlantic argentine, E73,  0.14,   0.04,   0.02,   5.68,   0.20,     0.02,     0.06
Atlantic argentine, E74,  0.16,   0.05,   0.02,   6.70,   0.25,     0.02,     0.06
Atlantic argentine, E75,  0.17,   0.04,   0.02,   6.32,   0.23,     0.03,     0.06
Atlantic argentine, E76,  0.11,   0.04,   0.02,   6.94,   0.27,     0.02,     0.06
Atlantic argentine, E77,  0.14,   0.05,   0.02,   6.97,   0.26,     0.03,     0.06
Butterfish,         E141, 0.30,   0.11,   0.03,   5.87,   0.09,     0.05,     0.05
Butterfish,         E142, 0.10,   0.10,   0.02,   4.93,   0.07,     0.03,     0.04

NOTE: the prey data file can contain columns with additional “tombstone” information such as sex, age, sampling date, sampling location, etc. If so, read the data file into a data frame and from that data frame create a second data frame in the required format. Including the sample ID in both data frames will allow the subsequent merging of qfasar results with the tombstone information.

Predator signature data

The predator signature data frame must be formatted exactly like the prey signature data frame (see Prey signature data), with each record being comprised of a predator type, a unique sample ID, and signature proportions.

The only difference between how prey and predator data are handled by qfasar is that the first column in the predator data frame is used to designate types of predators for which separate estimates of mean diet composition are desired. For example, Rode et al. (2014) estimated mean diet composition for four sex-age classes of polar bears, so the first column might contain the strings adult-male, adult-female, subadult-male, and subadult-female. If separate mean estimates are not desired for predator types, each row of the data frame should have an identical string in the first column, such as polar-bear.

NOTE: the predator data file can contain columns with additional “tombstone” information such as sex, age, sampling date, sampling location, etc. If so, read the data file into a data frame and from that data frame create a second data frame in the required format. Including the sample ID in both data frames will allow the subsequent merging of qfasar results with the tombstone information.

Fatty acid suite data

The fatty acid suite data frame contains fatty acid names, one or more sets of calibration coefficients, and definitions for one or more suites of fatty acids. The data frame must strictly meet the following formatting requirements.

The first row of the data frame must contain header information, such as “fatty acid”, “CC 1”, “CC 2”, “Suite 1”, “Suite 2”, “Comment”. These can most easily be included by having the names in the first row of the data file and using the header=TRUE option when reading the data into R.
The second row of the data frame must list “use_me” in the first column, a 1 in the column for the set of calibration coefficients to be used, a 1 in the column for the fatty acid suite to be used, and a 0 in all other columns.
Starting with the third row, the first column must contain fatty acid names, which must exactly match the header names for the corresponding fatty acids in the prey and predator data frames (see Prey signature data).
Starting with the third row, Columns 2 to k must contain calibration coefficients for each fatty acid. Multiple sets of calibration coefficients can be in the data frame. The set to be used must contain a 1 in Row 2 and any others must contain a 0 in Row 2.
Starting with the third row, Columns k+1 to m must contain one or more definitions of fatty acid suites. Membership in a suite is defined by 0/1 indicators, with a 1 indicating membership. Definitions for multiple suites can be in the data frame. For example, two columns could contain indicators defining membership in the dietary and extended-dietary suites of fatty acids (Iverson et al. 2004). The suite to be used must contain a 1 in Row 2 and the others must contain a 0 in Row 2.
An optional last column can contain comments, but care must be taken not to include commas in the comment field if the data are stored as csv files.

Use of the first row in the data frame to denote the set of calibration coefficients and fatty acid suite to be used in an analysis is intended to allow all available sets of calibration coefficients and all potential suites of fatty acids to be conveniently stored in the same file, while permitting fast and easy switching between them. It also allows all fatty acid data to be stored in a single file, which is more convenient and less confusing than having separate data files for each suite of fatty acids that one might want to use.

The first few rows of a csv fatty acid suite data file follow below as an example. This example has a single set of calibration coefficients (harbor_seal), definitions for two suites of fatty acids (dietary and ext_dietary), and comments in the last column. Note that extra spaces have been added between commas and the subsequent values to enhance readability here, but extra spaces should not be included in the data file.

fatty_acid, harbor_seal,  dietary,  ext_dietary,  comments
use_me,     1,            0,        1,            Values from Rosen and Tollit (2012) unless noted
c12.0,      0.863,        0,        0,  
c13.0,      0.516,        0,        0,  
Iso14,      0.806,        0,        0,  
c14.0,      0.588,        0,        1,  
c14.1w9,    0.823,        0,        0,  
c14.1w7,    2.853,        0,        0,  
c14.1w5,    12.137,       0,        0,  
Iso15,      0.538,        0,        0,  
Anti15,     0.785,        0,        0,  
c15.0,      0.673,        0,        0,  
c15.1w8,    0.942,        0,        0,  
c15.1w6,    6.372,        0,        0,  
Iso16,      0.812,        0,        0,  
c16.0,      0.479,        0,        1,  
c16.1w11,   1.898,        0,        0,  
c16.1w9,    3.506,        0,        0,  
c16.1w7,    2.377,        0,        1,  
c7Me16.0,   0.976,        0,        0,  
c16.1w5,    1.55,         0,        0,  
c16.2w6,    0.745,        1,        1,  
Iso17,      0.864,        0,        0,  
c16.2w4,    0.954,        1,        1,  
c16.3w6,    0.759,        1,        1,  
c17.0,      0.1,          0,        1,  
c16.3w4,    0.552,        1,        1,
c17.1,      1.618,        0,        0,
c16.3w1,    1,            1,        1,            From Nordstrom et al. (2008), BDL in Rosen & Tollit

Preparing for an analysis

The first steps in an analysis performed in qfasar involve reading the fatty acid suite, prey signature data, and possibly predator signature data into data frames (see Data frames). This will most likely be accomplished using the base R function read.csv(), or a related function, using code similar to that in the following code snippet.

# Specify directories and files.
my_data_dir <- "C:/Research/QFASA/My_Project"
my_fa_suites <- "my_fa_file"
my_prey_sigs <- "my_prey_data"
my_pred_sigs <- "my_pred_data"

# Set the working directory.
setwd(my_data_dir)

# Read the fatty acid suite information and signatures into data frames.
df_fa <- read.csv(file = my_fa_suites, header = TRUE)
df_prey <- read.csv(file = my_prey_sigs, header = TRUE)
df_pred <- read.csv(file = my_pred_sigs, header = TRUE)

The initial data frames read from files may need to be modified to get them in the format required by qfasar (see Data frames). Once the data frames have been correctly formatted, the following function calls are, or may be, required to prepare the data for analysis:

A required call to the function prep_fa() with the fatty acid suites data frame.
A required call to the function prep_sig() with the prey data frame.
A call to the function prep_sig() with the predator data frame is required if the analysis will involve predator signature data.
A call to the function cc_aug() is required if signatures are augmented (see Scaling signatures and Calibration coefficient for an augmented proportion.

The tasks performed by each of the functions mentioned in the above list are described in the follow subsections.

Preparing fatty acid suite information
Preparing signature data
Calibration coefficient for an augmented proportion

Preparing fatty acid suite information

The function prep_fa() takes the fatty acid suite data frame as input and returns a list containing the following objects.

cc - A numeric vector of the calibration coefficients to be used.
use - A logical vector defining the fatty acid suite to be used.
fa_names - A character vector of all fatty acid names.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

The following code snippet contains an example call to prep_fa().

# Process the fatty acid suite information.
fa_info <- prep_fa(df_fa = df_fa)
str(fa_info)

Preparing signature data

The function prep_sig() modifies raw fatty acid signatures in two important ways to prepare them for analysis, replacing invalid signature proportions and scaling signatures, each of which is described below.

Modifying invalid signature proportions
Scaling signatures

prep_sig() returns a list containing the following objects. Note that many of these objects are required inputs to other functions in qfasar.

type - A character vector of the type of each signature.
id - A character vector of the unique sample ID of each signature.
n_types - The number of unique types.
uniq_types - A character vector of the unique types, sorted alphanumerically.
n_sig - The total number of signatures.
type_ss - The number of signatures (sample size) for each unique type.
loc - A vector or matrix giving the first and last locations of the signatures of each type, after being sorted by type and id.
sig_rep - A vector or matrix of the original signatures, with any values missing or less than or equal to 0 replaced, in column-major format.
n_fa_rep - The number of fatty acids in sig_rep.
sig_scale - A vector or matrix of scaled signatures ready for analysis, sorted by type and id, in column-major format.
n_fa_suite - The number of fatty acids in sig_scale.
fa_suite - A character vector of the names of fatty acids in the suite to be used in the analysis.
zero_rep_val - A constant associated with the method and value to be used to replace proportions that are missing or less than or equal to 0.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Modifying invalid signature proportions

Any signature proportions that are negative, missing values (value NA in R), or equal to zero are generally invalid. prep_sig() replaces these invalid values, usually with a small positive constant (with one exception described below). This is required because the definitions of two popular distance measures, the Aitchison (Stewart et al. 2014) and the Kullback-Leibler (Iverson et al. 2004) involve logarithms and log(0) is not defined. prep_sig() implements two slightly different methods of replacing invalid proportions, the choice of method being controlled by the argument zero_rep in the call to prep_sig().

If 0 <= zero_rep <= 0.01, the specified value is used to replace invalid proportions. Note that if either the Aitchison or Kullback-Leibler distance measure will be used in an analysis, the value of zero_rep must be strictly greater than 0. The chi-square distance measure (Stewart et al. 2014) is defined for proportions of zero, so the value of zero_rep may equal zero if that distance measure will be used in the analysis. Information on the distance measures implemented in qfasar is available in the section Distance measures below.

If 10 <= zero_rep <= 100, the value is interpreted as a percentage, the smallest non-zero proportion in the signature data is multiplied by the percentage and divided by 100, and the result is used to replace invalid proportions. The default value for zero_rep is 75, a rather uninformed choice.

No matter which of these two methods is used to replace invalid proportions, the modified signatures are then scaled to sum to 1.0 using the multiplicative method (Martin-Fernandez et al. 2011).

Note that for analyses involving both prey and predator signatures, such as estimation of diet composition, it may be advisable to use a common value to replace invalid proportions in both prey and predator signatures. One way to accomplish this is to call prep_sig() with the either the prey or predator data frame, and then use the value it returns as zero_rep_val in the call with the other data frame. The following code snippet presents an example of this.

# Prepare the signatures for analysis.
prey_info <- prep_sig(df_sig = df_prey,
                      fa_names = fa_info$fa_names,
                      use_fa = fa_info$use,
                      zero_rep = 90)
str(prey_info)
pred_info <- prep_sig(df_sig = df_pred,
                      fa_names = fa_info$fa_names,
                      use_fa = fa_info$use,
                      zero_rep = prey_info$zero_rep_val)
str(pred_info)

Scaling signatures

Most QFASA applications use a subset of all available fatty acids (e.g., Iverson et al. 2004; Wang et al. 2010; Bromaghin et al. 2013). The traditional analytical method is to censor the fatty acids that are not wanted and scale the proportions of the remaining fatty acids so they sum to 1.0. However, Bromaghin et al. (2016b) found that the traditional scaling approach can meaningfully bias diet estimates under some conditions. They explored the performance of two alternative scaling options and found that both avoid the bias caused by traditional scaling and have comparable performance. prep_sig() implements all three scaling options evaluated by Bromaghin et al. (2016b).

The choice of the scaling option is controled by the prep_sig() argument scale.

scale = 1 Traditional scaling; the proportions within each censored signature are scaled to sum to 1.0. This option is not recommended for routine use in QFASA applications, as Bromaghin et al. (2016b) found that it can meaningfully bias diet estimates under some conditions. It is implemented here to provide compatibility with traditional methods and to potentially facilitate future research.
scale = 2 The proportions within each censored signature are not scaled, so each signature will have a different “partial sum” (see Bromaghin et al. (2016b) for a definition of that term).
scale = 3 Each censored signature is augmented with an additional proportion whose value equals the sum of the censored proportions, so that the proportions in each augmented signature sum to 1.0. This is the default option.

The following code snippet illustrates a call to the function prep_sig() that overrides the default value of the scale argument.

pred_info <- prep_sig(df_sig = df_pred,
                      fa_names = fa_info$fa_names,
                      use_fa = fa_info$use,
                      zero_rep = prey_info$zero_rep_val,
                      scale = 1)

Calibration coefficient for an augmented proportion

Calibration coefficients provide a one-to-one mapping between the prey and predator spaces (Bromaghin et al. 2015). However, when using signature augmentation (see Scaling signatures), no calibration coefficient is available for the augmented proportion. In this circumstance, and if an analysis will involve diet estimation or otherwise require use of calibration coefficients, the call to prep_sig() with the prey library must be followed by a call to cc_aug() before the analysis can proceed.

The following code snippet illustrates a call to the function cc_aug().

# Compute the calibration coefficient for the augmented proportion.
new_cc <- cc_aug(sig_rep = prey_info$sig_rep,
                 sig_scale = prey_info$sig_scale,
                 cc_all = fa_info$cc,
                 use_fa = fa_info$use,
                 dist_meas = 1)

The algorithm implemented in cc_aug() is straightforward (Bromaghin In press). The calibration coefficients cc are used to transform complete prey signatures in sig_rep to the predator space, and the transformed signatures are censored to the suite of fatty acids defined by the argument use_fa and then augmented (Bromaghin et al. 2016b). The subset of calibration coefficients in the suite definition use_fa are combined with an initial calibration coefficient for the augmented proportion (initial guess of 1), the censored signatures in sig_scale are also transformed to the predator space, and the total distance between the two sets of censored signatures is computed. The calibration coefficient for the augmented proportion is taken as the value that minimizes the total distance. The function Rsolnp::solnp() (Ghalanos & Theussl 2014) is used to minimize the distance between the two sets of censored signatures.

cc_aug() returns a list containing the following objects.

cc - A numeric vector of calibration coefficients for the augmented signatures.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Distance measures

qfasar implements three distance measures that have been used by QFASA practitioners and researchers:

Aitchison
Kullback-Leibler
Chi-square

A form of the Kullback-Leibler distance was recommended by Iverson et al. (2004). The Kullback-Leibler distance represents a balance between the absolute difference between two proportions and the relative difference or ratio between the proportions. Iverson et al. (2004) did some work to compare the Kullback-Leibler distance with several other distance measures, such as least squares, but the extent and results of those comparisons were not well-documented.

The Aitchison distance (Aitchison et al. 2000) has been used in some recent methodology work (e.g., Stewart and Field 2011; Bromaghin et al. 2015, 2016a, 2016b). It is based on differences between the logarithms of the ratio of a fatty acid proportion and the geometric mean of all proportions in a signature, and is therefore a relative measure.

Stewart et al. (2014) used the “chi-square” distance measure. This measure is based on the ratio of the squared difference between proportions and their sum, after the proportions have been power transformed and reclosed (scaled to sum to 1). With a power parameter of 1, this measure is therefore based on squared differences. However, Stewart et al. (2014) state that this measure asymptotically approaches the Aitchison under some conditions as the value of the power parameter approaches 0.

The following summary comments are collectively based on the simulation work of Bromaghin et al. (2015, 2016a, 2016b) to compare the performance of QFASA estimators. The estimator based on the Aitchison distance tends to have less bias than the Kullback-Leibler estimator. An additional advantage of the Aitchison distance is that it produces identical diet estimates in both the prey and predator spaces (Estimation spaces), subject to slight differences due to numerical optimization. For those reasons, it is the default distance measure in qfasar. However, with some prey libraries and predator diets, estimates based on the Kullback-Leibler distance have had less bias. More limited simulation work has found the chi-square distance to perform similarly to the Kullback-Leibler, which is not surprising as a power parameter of 1 Chi-square distance power parameter was used and so the chi-square distance was essentially based on relative squared differences. The performance of the chi-square distance for different values of the gamma parameter has not yet been tested. Importantly, this body of simulation work found that the interaction of a prey library, a specific diet composition, and the values of the calibration coefficients interact strongly to influence QFASA performance, and this strong interaction makes it difficult to make universal recommendations regarding methodological options, including the choice of a distance measure.

Chi-square distance power parameter

The chi-square distance involves a power transformation of signature proportions, with the power parameter being denoted gamma. The function comp_chi_gamma() implements the algorithm of Stewart et al. (2014) to find a suitable value of gamma.

The algorithm is initialized with inv_gamma equal to 1 and gamma is computed as 1/inv_gamma. The distances between all possible pairs of full signatures are computed (distances). For each pair of full signatures, the distances between all possible sub-signatures comprised of only two fatty acid proportions are computed (sub-distances). The proportion of sub-distances that exceed their corresponding distance is computed across all possible pairs of signatures. If that proportion is less than a user specified argument near_zero, the function returns with gamma equal to 1; otherwise, the function enters an iterative phase. At each iteration, inv_gamma is incremented by 1, gamma is computed as 1/inv_gamma, distances and sub-distances are recomputed, and the proportion of the sub-distances that exceed their corresponding distance is recomputed. The algorithm terminates when the proportion is less than the argument near_zero or the value of gamma is less than min_gamma.

The argument space must equal 1 or 2 (see Estimating individual diet and variance). If its value is 1, the calibration coefficients are used to map the signatures to the predator space prior to initializing the algorithm.

The following code snippet contains an example call to comp_chi_gamma().

# Estimate the chi-square distance gamma parameter.
gamma_est <- comp_chi_gamma(sigs = prey_info$sig_rep,
                            cc = fa_info$cc,
                            near_zero = 0.00001,
                            min_gamma = 0.05,
                            space = 1)
str(est_gamma)

comp_chi_gamma() returns a list containing the following objects:

gamma - The estimated value of gamma.
gamma_vec - A numeric vector containing the value of gamma at each step of the iteration.
prop_vec - A numeric vector containing the proportion of all possible two-element signatures with distance exceeding that of the full signatures at each step of the iteration. This value is compared to the argument near_zero.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

In the above code block, note that gamma is estimated using the full signatures with any values equal to zero or missing replaced (prey_info$sig_rep) and all calibration coefficients. If using augmentation, one might initially think to call comp_chi_gamma() with augmented signatures (Scaling signatures). However, specification of a distance measure is required to obtain a calibration coefficient for the augmented proportion (Calibration coefficient for an augmented proportion). Consequently, if augmentation is used, gamma must be estimated using the full signatures. If traditional scaling is to be used (scale <- 1), one could estimate gamma either with the full signatures or with the scaled signatures, though we have no idea how similar the two estimates of gamma would be; with traditionally scaled signatures, one would need to restrict the calibration coefficients to those associated with fatty acids in the scaled signatures, i.e., my_ccs <- fa_info$cc[fa_info$use]. If one is not scaling signatures (scale <- 2), they will be scaled internally for the computation of gamma.

As the number of signatures in the library and/or the number of fatty acids in a signature increases, the number of possible pairs of signatures and the number of all possible two-proportion sub-signatures increases rapidly. Consequently, comp_chi_gamma() is likely to require long run times. However, it only needs to be run once for any particular library of signatures.

I have little experience with the chi-square distance and with different values of the gamma parameter. Stewart et al. (2014) state that the chi-square distance approaches the Aitchison distance as gamma decreases. However, as gamma decreases, the power-transformed signature components become increasingly similar in magnitude, a characteristic which seems counterintuitive for a distance measure, at least on the surface. Consequently, I provide no recommendations regarding gamma and provide this functionality for the sake of completeness and to facilitate future comparative research into the performance of QFASA with the chi-square distance and a range of values for gamma.

Estimation spaces

Bromaghin et al. (2015) introduced the terms “prey space” and “predator space”. The prey space is simply the simplex in which the prey signatures reside and, similarly, the predator space is the simplex in which the predator signatures reside. The spaces differ due to predator metabolism of ingested prey tissue and the resulting modification of proportions. As mentioned earlier (Calibration coefficient for an augmented proportion), calibration coefficients provide a one-to-one mapping or transformation between the prey and predator spaces.

Diet estimation and other analyses, such as computer simulation, can be performed in either space. Iverson et al. (2004) used the calibration coefficients to map predator signatures to the prey space for diet estimation. Bromaghin et al. (2013) took the opposite approach, mapping prey signatures to the predator space for diet estimation. As diet estimation involves modeling predator signatures, operating in the predator space may seem more natural, but simulation work has not revealed any strong reason to prefer one space over the other (Bromaghin et al. 2015). However, there is one important issue with respect to estimation space to be aware of involving distance measures (Distance measures).

Estimating diet composition with the Kullback-Leibler distance measure may produce different estimates in the two spaces (Bromaghin et al 2015). This difference is caused by the component of the Kullback-Leibler distance that involves absolute differences between proportions, in combination with the change in the magnitude of proportions induced by application of the calibration coefficients. Unfortunately, I am unaware of a means to determine which set of estimates are better in any particular investigation. Although the sensitivity of the chi-square distance measure to estimation space has not been explicitly explored, its definition is based on absolute differences between power-transformed proportions and I therefore suspect estimates obtained in the two spaces would also differ. The Aitchison distance is scale-invariant and estimates therefore do not differ as a function of the estimation space in which they are obtained, except for small differences attributable to numerical optimization. This stability is one reason the Aitchison distance is the default distance measure in qfasar.

Diet estimation

Diet estimation is the central purpose of qfasar and a variety of estimation options are implemented in the function est_diet(). Estimation options include choices for estimation space, measuring distance between signatures, estimating mean diet composition, and estimating the variances of individual and mean diet composition estimates. Please refer to the sections Estimation spaces and Distance measures for information on those topics.

The data arguments passed in a call to est_diet() are intended to be the objects returned by calls to prep_fa() with the fatty acid suites data frame, prep_sig() with the predator data frame, and prep_sig() with the prey data frame (see Preparing for an analysis). The remaining arguments are integer indicators of method selection or bootstrap sample sizes, described in subsequent subsections.

The following block of code presents a complete example from reading data files to diet estimation.

# Specify directories and file names.
my_data_dir <- "C:/Research/QFASA/My_Project"
my_fa_suites <- "my_fa_file"
my_prey_sigs <- "my_prey_data"
my_pred_sigs <- "my_pred_data"

# Specify constants.
my_zero_rep <- 90
my_scale <- 3
my_dist_meas <- 1
my_gamma <- NA
my_space <- 1
my_ind_boot <- 100
my_mean_meth <- 1
my_var_meth <- 1
my_mean_boot <- 250

# Set the working directory.
setwd(my_data_dir)

# Read and prepare fatty acid information.
df_fa <- read.csv(file = my_fa_suites, header = TRUE)
fa_info <- prep_fa(df_fa = df_fa)
str(fa_info)

# Read and prepare prey signatures.
df_prey <- read.csv(file = my_prey_sigs, header = TRUE)
prey_info <- prep_sig(df_sig = df_prey,
                      fa_names = fa_info$fa_names,
                      use_fa = fa_info$use,
                      zero_rep = my_zero_rep,
                      scale = my_scale)
str(prey_info)

# Read and prepare predator signatures.
df_pred <- read.csv(file = my_pred_sigs, header = TRUE)
pred_info <- prep_sig(df_sig = df_pred,
                      fa_names = fa_info$fa_names,
                      use_fa = fa_info$use,
                      zero_rep = prey_info$zero_rep_val,
                      scale = my_scale)
str(pred_info)

# I used signature augmentation in the above call to prep_sig(), i.e., scale == 3.
# Consequently need to compute a calibration coefficient for the augmented proportion.
new_cc <- cc_aug(sig_rep = prey_info$sig_rep,
                 sig_scale = prey_info$sig_scale,
                 cc_all = fa_info$cc,
                 use_fa = fa_info$use,
                 dist_meas = my_dist_meas,
                 gamma = my_gamma)

# Estimate predator diets.
# Use cc = fa_info$cc[fa_info$use] if signatures are not augmented.
diet_est <- est_diet(pred_sigs = pred_info$sig_scale,
                     pred_uniq_types = pred_info$uniq_types,
                     pred_loc = pred_info$loc,
                     prey_sigs = prey_info$sig_scale,
                     prey_uniq_types = prey_info$uniq_types,
                     prey_loc = prey_info$loc,
                     cc = new_cc$cc,
                     space = my_space,
                     dist_meas = my_dist_meas,
                     gamma = my_gamma,
                     ind_boot = my_ind_boot,
                     mean_meth = my_mean_meth,
                     var_meth = my_var_meth,
                     mean_boot = my_mean_boot)
str(diet_est)

est_diet() returns a list containing the following elements:

pred_sigs - A numeric matrix of predator signatures in column-major format in the optimization space indicated by the argument space.
prey_sigs - A numeric matrix of prey signatures in column-major format in the optimization space indicated by the argument space.
mean_sigs - A numeric matrix of mean prey-type signatures in the optimization space indicated by the argument space.
est_ind - A numeric matrix of the estimated diet compositions of individual predators.
conv - A logical vector indicating whether the optimization function successfully converged.
obj_func - A numeric vector of the values of the objective function at each predator’s estimated diet.
mod_sigs - A numeric matrix of the modeled signature of each predator at its estimated diet.
var_ind - A three-dimensional numeric array containing the estimated variance matrix for the estimated diet composition of each predator. The predator indexes the third dimension.
est_mean - A numeric matrix containing the estimated mean diet composition of each predator type.
conv_mean - A logical vector indicating whether the estimated mean diet composition of each predator type is based on at least one diet estimate that converged.
var_mean - A three-dimensional numeric array containing the estimated variance matrix for the estimated mean diet composition of each predator type. The predator type indexes the third dimension.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

NOTE: The numerical optimization and bootstrap sampling performed by est_diet() are numerically intensive and can result in long run times. Patience is advised! The primary factors causing slow execution are the number of predator signatures, the number of predator and prey types, and the bootstrap sample sizes.

Options for estimating individual and mean diet composition, the variance of diet composition estimates, and adjusting diet estimates for differential prey fat content are described in the following subsections.

Estimating individual diet and variance
Estimating mean diet and variance
Adjusting for fat content

Estimating individual diet and variance

The diet composition of an individual predator is estimating by minimizing a measure of distance (Distance measures) between its observed signature and a modeled signature computed from the diet composition and mean prey-type signatures (Iverson et al. 2004). est_diet() uses the function Rsolnp::solnp() to minimize the distance (Ghalanos & Theussl 2014). The selection of a distance measure is controlled by the argument dist_meas:

dist_meas == 1 yields the Aitchison distance measure.
dist_meas == 2 yields the Kullback-Leibler measure.
dist_meas == 3 yields the chi-square distance measure.

Diet estimation can occur in either the predator or prey space (Bromaghin et al. 2015; Estimation spaces), controlled by the argument space.

space == 1 indicates estimation should occur in the predator space. This is the default value.
space == 2 indicates estimation should occur in the prey space.

The variance matrix of an individual diet composition estimate is estimated through bootstrap sampling. The signatures of each prey type are indepedently sampled with replacement, with sample sizes equal to the prey-type sample sizes in the prey library, and amalgamated to construct a bootstrapped prey library. The bootstrapped prey library is then used to estimate the predator’s diet. This is repeated a specified number of times (argument ind_boot), resulting in replicated estimates of the diet of each predator. The empirical variance and covariance of the replicated prey-type estimates are used to construct a covariance matrix for each predator (Beck et al. 2007, Bromaghin et al. 2015). Note that the prey library is resampled independently for each predator.

If a particular analysis does not require the variances of the diet estimates for individual predators to be estimated, variance estimation can be disabled by setting the individual bootstrap sample size equal to zero (ind_boot <- 0).

If estimated, the variance matrices, one for each predator, are returned via the three-dimensional array var_ind. The predators index the third dimension. The following code snippet illustrates how to extract the variance information for an individual predator.

# Specify a predator.
this_pred <- 1

# Extract the variance matrix for this predator.
var_mat <- diet_est$var_ind[,,this_pred]
print(var_mat)

# Extract the standard errors for this predator.
se <- sqrt(diag(var_mat))
print(se)

Estimating mean diet and variance

Mean diet composition can be estimated for each type in the predator signature data (see Data frames). est_diet() provides three options for estimating mean diets, controlled by the argument mean_meth.

mean_meth == 0 skips estimation of mean diet composition.
mean_meth == 1 returns the empirical estimate of mean diet composition.
mean_meth == 2 returns the “parameterized mean” estimate of mean diet composition.

The empirical mean estimator is simply the average of the estimated diets for individual predators within each predator type.

The “parameterized mean” estimator is an unpublished estimator that I have experimented with to a limited extent. This estimator reparameterizes the QFASA model with a single vector of diet proportions common to all predators and mean diet is estimated by minimizing the distance between the mean signature modeled from the mean diet proportions and each predator’s observed signature, summed over all predators. The parameterized mean estimator has not yet been thoroughly tested and its inclusion in qfasar is intended to facilitate planned research. Our limited work with the parameterized mean estimator to date suggests that it may perform well when predator signatures are homogeneous, but may be more sensitive to the presence of “outlier” signatures in the predator data than the empirical estimator.

est_diet() returns the estimated mean diet compositions in the object est_mean.

qfasar implements two methods of estimating the variance of mean diet composition estimates, the variance estimator of Beck et al. (2007) and a bootstrap estimator, controlled by the argument var_meth.

var_meth == 0 skips variance estimation.
var_meth == 1 yields the bootstrap estimator. This is the default value.
var_meth == 2 yields the estimator of Beck et al. (2007), generalized to produce a variance matrix.

The bootstrap estimator draws independent samples of each prey type to form a bootstrap prey library, exactly as in Estimating individual diet and variance, and also a random sample of each predator type, with replacement and sample sizes equal to the observed sample sizes. Mean diet is estimated using the bootstrapped prey and predator signatures and the method indicated by the argument mean_meth. The argument mean_boot controls the number of times this process is replicated, and the replications are used to estimate the covariance matrix of the mean diet composition estimates for each predator type. Unpublished work indicates that the bootstrap estimator is more reliable than the Beck et al. (2007) estimator.

Note that the Beck estimator cannot be used in combination with the parameterized-mean estimator of mean diet composition.

est_diet() returns the estimated variances of the predator-type mean diet composition estimates in the object var_mean.

The following code snippet illustrates one method of extracting the diagonal of the variance matrix for all predator types, computing the standard errors, and combining mean diet estimates and associated standard errors into a single matrix.

# There must be a more elegant way to extract all the diagonals into a matrix,
# but the following works.

# Allocate memory for the standard errors.
stderr <- matrix(data = 0, nrow = prey_info$n_types, ncol = pred_info$n_types)
colnames(stderr) <- colnames(diet_est$est_mean)
rownames(stderr) <- rownames(diet_est$est_mean)

# Extract the diagonal of the variance matrix for each predator type and compute
# the standard errors. Note: I use li# for "loop index number".
for(li1 in 1:pred_info$n_types){
  stderr[,li1] <- sqrt(diag(diet_est$var_mean[,,li1]))
}

# Combine estimates of means and standard errors in one matrix.
est_comb <- rbind(diet_est$est_mean, stderr)
rownames(est_comb) <- c(paste("Mean", prey_info$uniq_types, sep = " - "),
                        paste("SE", prey_info$uniq_types, sep = " - "))

# Transpose the matrix (if desired?).
est_comb <- t(est_comb)

Adjusting for fat content

The function est_diet() estimates diet composition in terms of the fatty acid mass consumed (units [fatty acid mass]). Such estimates may be of direct ecological interest, especially for ecosystems in which lipids are particularly important. In other situations, one may wish to transform estimates to a different scale (Iverson et al. 2004). The qfasar function adj_diet_fat() performs such transformations.

adj_diet_fat() takes constants specifying the differential fat content of prey types, one or more estimates of diet composition, and potentially one or more variance matrices as input arguments. The diet composition and variance objects are intended to be the corresponding objects returned by the function est_diet(). Either estimates for individual predators or mean estimates for predator types may be passed to the function.

Note that the units of the fat-content constants (input argument prey_fat) control the units of the resulting transformed diet estimates. For example, if the units of prey_fat are [fatty acid mass]/[animal mass], the adjusted estimates are in terms of the relative mass of prey consumed (e.g., Bromaghin et al. 2013). Alternatively, if the units of prey_fat are [fatty acid mass]/[animal], the adjusted estimates are in terms of the relative numbers of prey animals consumed. It is the user’s responsibility to know the units associated with the fat content values and interpret the adjusted estimates correctly. In any case, the variance of the transformed diet estimates is estimated from the input arguments using the delta method (Seber 1982)

NOTE: values of the fat parameters are treated as known constants without variance.

The following code snippet illustrates an example call to adj_diet_fat().

# Estimate predator diets in terms of fatty acid mass consumed.
mass_est <- est_diet(pred_sigs = pred_info$sig_scale,
                     pred_uniq_types = pred_info$uniq_types,
                     pred_loc = pred_info$loc,
                     prey_sigs = prey_info$sig_scale,
                     prey_uniq_types = prey_info$uniq_types,
                     prey_loc = prey_info$loc,
                     cc = new_cc$cc,
                     space = my_space,
                     dist_meas = my_dist_meas,
                     gamma = my_gamma,
                     ind_boot = my_ind_boot,
                     mean_meth = my_mean_meth,
                     var_meth = my_var_meth,
                     mean_boot = my_mean_boot)


# Specify fat content, one value for each prey type and in the alphanumeric order of the
# prey-type names. Here, I generate random values purely for illustrative purposes,
# because I do not have any fat data handy.  With values, use the concatenate function:
# fat_vals <- c(val1, val2, ...).
fat_vals <- runif(n = length(prey_info$uniq_types), min = 0.25, max = 1.75)
names(fat_vals) <- prey_info$uniq_types


# Transform individual estimates of diet composition in terms of fatty acid mass
# to estimates in terms of some other desired scale. Note that the units of the
# fat constants determine the resulting scale. Ideally, the units would be
# (fatty acid mass)/(prey mass) or (fatty acid mass)/(prey animal).
trans_est_ind <- adj_diet_fat(prey_fat = fat_vals,
                              diet_est = mass_est$est_ind,
                              diet_var = mass_est$var_ind)
                              
# Transform mean estimates of diet composition in terms of fatty acid mass
# to estimates in terms of some other desired scale.                              
trans_est_mean <- adj_diet_fat(prey_fat = fat_vals,
                               diet_est = mass_est$est_mean,
                               diet_var = mass_est$var_mean)

str(trans_est_mean)

adj_diet_fat() returns a list containing the following objects:

diet_est - A numeric vector or matrix of the estimated diet composition of individual predator(s) or predator type(s) in the units represented by the argument prey_fat.
diet_var - A numeric matrix or array containing the estimated variance matrix for the estimated diet of individual predator(s) or predator type(s), in the units represented by prey_fat.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Diagnostic functionality

qfasar contains diagnostic functionality that may be useful for some applications. Those aspects of qfasar are described in the following subsections.

Exploring signatures for structure
Leave-one-prey-out analysis
Signature & calibration coefficient consistency
Goodness of fit

Exploring signatures for structure

Prey and, perhaps to a lesser extent, predator signatures come with some pre-defined structure. A sample of signatures is required from each potential prey type, often species, so those a priori prey types represent structure within a prey library. Similarly, predator data may be compiled from some number of recognized classes, for example sex, age, or size classes. Such pre-defined structure helps frame an analysis in important ways, such as defining the prey types to which estimates of diet composition will pertain or classes of predators for which separate estimates of mean diet composition are desired.

However, one may reasonably wonder if additional structure exists within the signatures. For example, within a prey library structured by species, might there be clusters of individuals within one or more species that are more similar within clusters than between clusters? If so, there may be advantages to subdividing the prey library into a larger number of types.

qfasar implements an algorithm referred to as divisive magnetic clustering (dimac; Bromaghin et al. Under review) in the function dimac() to explore a collection of signatures for additional structure within defined types. The dimac algorithm was modified from the diana algorithm of the package cluster (Maechler et al. 2016). The algorithm is initialized with all signatures in one cluster. The first two “magnets”” are chosen as the two signatures having the greatest distance between them, using a distance measure of the user’s choice (Distance measures), and each signature is drawn to the nearest magnet to form clusters. The algorithm then enters an iterative phase. At each iteration, the cluster with the greatest average distance between its signatures and the mean signature is identified as the “active” cluster. The two signatures within the active cluster having the greatest distance between them are selected as new magnets. One of the two new magnets replaces the original magnet for the active cluster and the second starts the formation of an additional cluster. Each signature is drawn to the nearest magnet to form clusters, without regard for its cluster membership in the preceding iteration. Consequently, the algorithm is not simply bifurcating, but rather is much more dynamic and flexible. The iterations continue until each signature is in its own cluster.

dimac() is intended to be called after the fatty acid signatures have been prepared for analysis by calls to the functions prep_fa() and prep_sig(). The following code snippet illustrates a typical call to dimac().

# Explore prey signatures for sub-structure.
my_clust <- dimac(sigs = prey_info$sig_scale,
                  id = prey_info$id,
                  type = prey_info$uniq_types,
                  loc = prey_info$loc,
                  dist_meas = my_dist_meas)
str(my_clust)

dimac() returns a list containing the following elements:

clust - A data frame denoting cluster assignments at each iteration of the algorithm.
clust_dist - A numeric matrix of the summed distance within clusters at each iteration.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Unfortunately, there is no objective method to determine the most appropriate number of clusters. It may seem tempting to use a large number of clusters in subsequent analyses to exploit fine scale structure in the data. Indeed, Meynier et al. (2010) used individual prey animals as prey types, so one might argue that having a large number of clusters is acceptable. One disadvantage of using a large number of prey types is that the speed of numerical optimization will be slow with large prey libraries. Also, partitioning a prey library into a large number of clusters can increase prey confounding between sub-types (Bromaghin et al. Under review). My recommendation is to use the results of dimac() conservatively and only partition prey types into multiple clusters if there is strong evidence for doing so. Examine the distance results in the object clust_dist returned by make_prey_part() and identify any large reductions in distance that are followed by a more gradual decrease in distance as the number of clusters increases further. An example of this pattern is provided by the prey type Gaspereau from the Scotian fish prey library (Budge et al. 2002; Bromaghin et al. 2016a).

Number_of_clusters <- 1:41
Distance <- c(209.8, 190.4, 107.6, 96.3, 88.8, 82.5, 76.5, 72.6, 64.2, 59.8,
              55.2, 48.3, 45.2, 42.1, 40.0, 33.9, 31.8, 29.5, 26.9, 25.3,
              23.0, 21.2, 19.1, 17.3, 16.2, 14.7, 14.8, 13.6, 11.1, 9.9,
              8.5, 7.3, 6.5, 5.6, 5.1, 4.1, 3.2, 2.4, 1.4, 0.6,
              0.0)
plot(x = Number_of_clusters, y = Distance, main="Scotian: Gaspereau")

Moving from one to two clusters provides little reduction in the summed distance within clusters, but the increase to three clusters is associated with a substantial drop in distance that is likely attributable to the discovery of some true structure within the signatures. As the number of clusters increases above three, distance decreases rather continuously to zero. I interpret this result as evidence that three clusters exist within the Gaspereau prey type and, in fact, the three clusters correspond well to ancillary information on the year and location samples were collected (Bromaghin et al. Under review).

Note that for diet estimation applications, partitioning a prey library into more clusters than the number of fatty acids used to estimate diet may result in estimates that are not unique. In such a case, estimates of diet composition need to be pooled into a smaller number of “reporting groups” (e.g., Bromaghin 2008; Meynier et al. 2010), which one would hope would largely eliminate any lack of uniqueness. See Bromaghin et al. (Under review) for a more detailed discussion of this issue.

Using a prey partition

To use the results of dimac() prey partitioning for diet estimation or some other analysis, the user needs to determine how many clusters each prey type should be partitioned into. To do so,

Examine the results of the matrix “clust_dist” returned by the function dimac() and the number of clusters for each prey type.
Construct an integer vector with the desired number of clusters for each prey type.
Call the function make_prey_part().

The following code block continues the above example of Gaspereau and the Scotian fish prey library. Note that this prey library has 28 prey types; Gaspereau are the fifth prey type.

# Specify the number of clusters for each prey type.
n_clust <- c(2, 1, 4, 3, 3, 3, 1, 3, 1, 2,
             2, 2, 3, 2, 3, 1, 1, 1, 3, 1,
             1, 1, 1, 1, 2, 2, 2, 1)

# Partition the prey library.
prey_part <- make_prey_part(sig = prey_info$sig_scale,
                            clust = my_clust$clust,
                            n_clust = n_clust)

make_prey_part() performs two important functions. First, it uses the information on the desired number of clusters for each prey type to construct a new prey signature library with modified prey-type names that is ready for diet estimation or other analyses. Second, it constructs two matrices to facilitate pooling diet estimates made on the basis of the partitioned library to the original prey types in the un-partitioned library.

make_prey_part() returns a list containing the following elements:

type - A character vector of the partitioned prey type of each signature.
id - A character vector of the unique sample ID of each signature.
n_types - The number of unique types in the partitioned library.
uniq_types - A character vector of the unique types, sorted alphanumerically.
type_ss - The number of signatures for each unique type.
loc - A vector or matrix giving the first and last locations of the signatures of each type, after being sorted by type and id.
sig_part - A matrix of scaled signatures ready for analysis, sorted by the partitioned prey type and id, in column-major format.
pool_pre - A matrix to pre-multiply diet estimates associated with a partitioned library to pool estimates back to the original prey types.
pool_post - A matrix to post-multiply diet estimates associated with a partitioned library to pool estimates back to the original prey types.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Pooling partitioned estimates to the original prey types

If significant structure is found within a prey library, estimating diet composition on the basis of a partitioned prey library may produce estimates with less bias and possibly less variation through a reduction in prey confounding (Bromaghin et al. Under review). However, if a partitioned library is used to estimate predator diet composition, one may wish to pool the estimates back to the original, unpartitioned prey types for reporting purposes. The functionality to do so is implemented in the function diet_pool().

diet_pool() takes information on the prey partition and estimates returned by a call to est_diet() (Diet estimation) as inputs and returns estimates pooled back to the original, unpartitioned prey types. The inputs to diet_pool() are

rep_grp - A reporting-group matrix intended to be the post-multiplication matrix returned by a call to make_prey_part() as the object pool_post. Alternatively, a user-defined matrix for customized pooling (see below). Each column defines a potentially pooled prey type.
est_ind - A numeric matrix of the estimated diet compositions of individual predators using a partitioned prey library, intended to be the object est_ind returned by a call to est_diet().
var_ind - A numeric array containing the estimated variance matrix for the estimated diet of each predator, intended to be the object var_ind returned by a call to est_diet().
est_mean - A numeric matrix containing the estimated mean diet of each predator type, intended to be the object est_mean returned by a call to est_diet().
var_mean - A numeric array containing the estimated variance matrix for the estimated mean diet of each predator type, intended to be the object var_mean returned by a call to est_diet().

The following code block combines example code from Diet estimation and the Scotian fish prey library example from earlier in this section. Note that this prey library has 28 prey types.

# Specify the number of clusters for each prey type.
n_clust <- c(2, 1, 4, 3, 3, 3, 1, 3, 1, 2,
             2, 2, 3, 2, 3, 1, 1, 1, 3, 1,
             1, 1, 1, 1, 2, 2, 2, 1)

# Partition the prey library.
prey_part <- make_prey_part(sig = prey_info$sig_scale,
                            clust = my_clust$clust,
                            n_clust = n_clust)

# Estimate predator diets.
my_est <- est_diet(pred_sigs = pred_info$sig_scale,
                   pred_uniq_types = pred_info$uniq_types,
                   pred_loc = pred_info$loc,
                   prey_sigs = prey_part$sig_part,
                   prey_uniq_types = prey_part$uniq_types,
                   prey_loc = prey_part$loc,
                   cc = new_cc$cc,
                   space = my_space,
                   dist_meas = my_dist_meas,
                   gamma = my_gamma,
                   ind_boot = my_ind_boot,
                   mean_meth = my_mean_meth,
                   var_meth = my_var_meth,
                   mean_boot = my_mean_boot)
str(my_est)

# Pool estimates to the original prey types.
pool_est <- diet_pool(rep_grp = prey_part$pool_post,
                      est_ind = my_est$est_ind,
                      var_ind = my_est$var_ind,
                      est_mean = my_est$est_mean,
                      var_mean = my_est$var_mean)
str(pool_est)

diet_pool() returns the same four estimation objects that it takes as input, but pooled to the original prey types.

NOTE: diet_pool() can be used to pool any estimates into a smaller number of combined prey types for reporting purposes, i.e., it does not necessarily have to be called with estimates from a partitioned library. For example, imagine a prey library with a large number of prey types, subsets of which have similar ecological function and signatures that are consequently somewhat similar (prey confounding, Bromaghin et al. 2016b). In such a case, one may wish to estimate diet on the basis of the full prey library, but subsequently pool the resulting estimates to a smaller number of combined prey types for reporting purposes (reporting groups, Bromaghin 2008) to reduce bias from prey confounding. diet_pool() can be used for this purpose, though the user needs to manually construct the reporting group matrix rep_grp because the original library was not partitioned by a call to make_prey_part(). As an example, a reporting group matrix to pool estimates for 10 prey types to 7 prey types might look like the following.

rep_grp = matrix(c(1, 0, 0, 0, 0, 0, 0,
                   0, 1, 0, 0, 0, 0, 0,
                   0, 1, 0, 0, 0, 0, 0,
                   0, 0, 1, 0, 0, 0, 0,
                   0, 0, 0, 1, 0, 0, 0,
                   0, 0, 0, 1, 0, 0, 0,
                   0, 0, 0, 0, 1, 0, 0,
                   0, 0, 0, 0, 0, 1, 0,
                   0, 0, 0, 0, 0, 1, 0,
                   0, 0, 0, 0, 0, 0, 1),
                 nrow = 10, byrow = TRUE)

In the above example, prey types 2 and 3 would be pooled into a new prey type 2, prey types 5 and 6 would be pooled into a new prey type 4, and prey types 8 and 9 would be pooled into a new prey type 6.

Leave-one-prey-out analysis

In any type of QFASA analysis, one might reasonably wonder how well the prey library might perform. For example, if the signatures of the individuals within each prey type are similar and quite different from the signatures of other prey types, QFASA might perform very well. Conversely, performance might suffer if some prey types are similar, which Bromaghin et al. (2016b) referred to as “prey confounding”, or if distinct clusters of similar prey occur within some prey types (see Exploring signatures for structure). Although such expectations seem reasonable, it is important to recognize that model performance is largely controlled by the complex interaction of a predator’s diet and characteristics of a prey library (Bromaghin et al. 2015, 2016a, 2016b). For example, performance can be poor if most prey types are distinct, but a predator’s diet is dominated by a subset of prey types that are confounded. The degree to which a predator diet is concentrated on distinct prey types has been referred to as “diet identifiability” (Bromaghin et al. 2016b). Of course, the accuracy of the calibration coefficients is also a critical determinant of model performance.

Despite the above cautions, it seems reasonable to investigate the performance of a prey library as long as one interprets the results with the cautions firmly in mind. There may be several ways this could be done. For example, Haynes et al. (2015) used what is commonly called “prey-on-prey” simulations to evaluate their prey library. The approach implemented in qfasar is a conceptually-similar leave-one-prey-out technique.

A leave-one-prey-out analysis in qfasar is conducted as follows. A single signature is temporarily removed from the prey library, the mean prey-type signatures are computed, the mixture of prey signatures that best approximates the removed signature is estimated as in diet estimation, after which the temporarily removed signature is returned to the library. This is done for all prey signatures in turn. The mean diet estimate is computed for the signatures of each prey type. This analysis performed on an ideal prey library would result in 100% of the diet estimate from each signature being attributed to the prey type to which the signature belongs. The degree of prey confounding within a library is reflected by the extent to which portions of the diet estimates are attributed to incorrect prey types.

In qfasar, a leave-one-prey-out analysis is conducted by a call to the function lopo(). The inputs lopo() requires are obtained by calls to the functions prep_sig() with the prey signature data, or possibly make_prey_part() if a prey library has been partitioned into clusters (see Exploring signatures for structure). The following code snippet contains an example call to lopo().

# Perform a leave-one-prey-out analysis with a partitioned library.
lib_perf <- lopo(sigs = prey_part$sig_part,
                 type = prey_part$type,
                 uniq_types = prey_part$uniq_types,
                 type_ss = prey_part$type_ss,
                 loc = prey_part$loc,
                 dist_meas = my_dist_meas,
                 gamma = my_gamma)
str(lib_perf)

lopo() returns a list containing the following elements:

est - A square numerical matrix containing the mean distribution of estimates among all prey types, by prey-type (rows). Perfect estimation would result in a 1 in each element along the diagonal.
mean_correct - The mean proportion correctly estimated across prey types, unweighted by prey-type sample sizes.
total_correct - The proportion of all signatures correctly estimated, irrespective of prey type designations.
n_conv - A integer vector containing the number of estimates that converged for each prey type.
err_code - An integer error code(0 if no error is detected).
err_message - A string containing a brief summary of the results.

Note:

The statistics returned by lopo() are computed based on the estimates that successfully converge.
Prey types represented by a single signature are skipped.
Because of the numerical optimization involved in a leave-one-prey-out analysis, lopo() can take a few minutes to run with a large prey library.

Pool leave-one-prey-out results to the original prey types

If a prey library has been partitioned using make_prey_part (see Exploring signatures for structure), one might want to pool the results back to the original, unpartitioned prey types to assess the improvement relative to the unpartitioned prey library. The function lopo_pool() performs that task, e.g.,

# Perform a leave-one-prey-out analysis with the original library.
perf_orig <- lopo(sigs = prey_info$sig_scale,
                  type = prey_info$type,
                  uniq_types = prey_info$uniq_types,
                  type_ss = prey_info$type_ss,
                  loc = prey_info$loc,
                  dist_meas = my_dist_meas,
                  gamma = my_gamma)

# Partition the prey library.
prey_part <- make_prey_part(sig = prey_info$sig_scale,
                            clust = my_clust$clust,
                            n_clust = n_clust)

# Perform a leave-one-prey-out analysis with the partitioned library.
perf_part <- lopo(sigs = prey_part$sig_part,
                  type = prey_part$type,
                  uniq_types = prey_part$uniq_types,
                  type_ss = prey_part$type_ss,
                  loc = prey_part$loc,
                  dist_meas = my_dist_meas,
                  gamma = my_gamma)

# Pool results back to the original prey types.
pooled_results <- lopo_pool(est = perf_part$est,
                            n_conv = perf_part$n_conv,
                            type_ss = prey_part$type_ss,
                            pre = prey_part$pool_pre,
                            post = prey_part$pool_post)

# Compare correct allocations.
cbind(diag(perf_orig$est), diag(perf_part$est))

lopo_pool() returns a list containing the following elements, all of which are organized on the basis of the original, unpartitioned prey types:

est - A square numerical matrix containing the mean distribution of leave-one-prey-out estimates among all prey types.
mean_correct - The mean proportion correctly estimated across prey types, unweighted by prey-type sample sizes.
total_correct - The proportion of all signatures correctly estimated.
n_conv - An integer vector containing the number of estimates that converged.
err_code - An integer error code (0 if no error is detected).
err_message - A string containing a brief summary of the results.

Please refer to the documentation for lopo_pool() for additional information.

Signature & calibration coefficient consistency

Predator signatues are assumed to be a linear mixture of mean prey signatures (Iverson et al. 2004) and are modeled as such during diet estimation. Predator signature proportions should therefore be within the range of prey signature proportions and proportions outside the range of the prey proportions are indicative of a violation of one or both of the primary model assumptions, i.e., the prey library is incomplete or the calibration coefficients are inaccurate (Bromaghin et al. 2015, 2016a). Consequently, checking for predator proportions that are outside the range of mean prey proportions is an important diagnostic aid that may provide insights into the reliability of diet estimates.

qfasar uses the function pred_beyond_prey() to identify predator signature proportions that are outside the range of proportions observed among the individual and mean prey signatures. For purposes of diet estimation, proportions outside the range of the mean signatures are most important. However, pred_beyond_prey() also identifies predator proportions that are outside the range of the individual prey proportions for exploratory purposes.

pred_beyond_prey() is designed to be called with inputs returned by the function est_diet() (see Diet estimation). Although it is not conceptually necessary to estimate diets before performing this diagnostic check, doing so ensures that the predator and prey signatures have been transformed to the optimization space (Bromaghin et al. 2015) in which diets have been estimated. An example call to pred_beyond_prey() is contained in the following code snippet.

# Find predator proportions outside range of prey proportions.
beyond_prey <- pred_beyond_prey(pred_sigs = diet_est$pred_sigs,
                                prey_sigs = diet_est$prey_sigs,
                                mean_sigs = diet_est$mean_sigs)
str(beyond_prey)


# Compute percentage out of range by fatty acid, predator, and overall.
mean_by_fa <- apply(X = beyond_prey$beyond_mean, MARGIN = 1, FUN = mean)
mean_by_pred <- apply(X = beyond_prey$beyond_mean, MARGIN = 2, FUN = mean)
mean_all <- mean(beyond_prey$beyond_mean)

pred_beyond_prey() returns a list with the following objects:

beyond_ind - A logical matrix with TRUE indicating that the corresponding predator proportion is outside the range of individual prey proportions.
beyond_mean - A logical matrix with TRUE indicating that the corresponding predator proportion is outside the range of mean prey proportions.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Although pred_beyond_prey() performs an important diagnostic check, how best to use the results may not always be obvious. Finding that all the predator proportions are within the range of the prey proportions is obviously desirable, but that may not be a realistic expectation when working with real data. A relatively small number of proportions being out of range may not be problematic. One might find that signs of problems are limited to a small number of predators or to one or two fatty acids across many predators. In such cases, a reasonable way forward may not be difficult to devise. However, as the signs of problems become increasingly prevalent, the best way to deal with them may not be clear. For example, Bromaghin et al. (2015) summarised an instance in which nearly half of all predator signature proportions were outside the range of the mean prey proportions. I can offer no general advice for how to proceed in such extreme cases.

Goodness of fit

In diet estimation (Diet estimation), a predator signature is modeled as a linear mixture of prey signatures and diet composition is estimated as the mixture proportions that minimize a measure of distance between observed and modeled predator signatures (Iverson et al. 2004). Consequently, one might expect that how well modeled signatures approximate observed signatures would be of interest, and potentially informative with respect to interpretation of the resulting diet estimates. While that seems likely to be true, methods to assess how well predator signatures are modeled have received almost no attention in the literature (but see Bromaghin et al. 2015).

A potentially informative byproduct of diet estimation is the value of the minimized distance measure associated with each estimated diet. The value will tend to be small when a modeled signature closely approximates an observed signature, and larger when a modeled signature approximates an observed signature less well. However, what value of the minimized distance measure might provide a warning flag for a potentially poor fit is an open question.

The function gof() represents a first attempt to address that question. The algorithm implemented in the function is based on the following logic. First, I assume that a predator consumes the mixture of prey specified by its estimated diet composition. Given that assumption, the expected value of the distance measure is, in a sense, fixed by the combination of mean prey signatures and the assumed diet composition (Bromaghin 2015). Large values of the distance measure are then most likely to occur when there is a high degree of random variation in a predator signature, which results only from the selection of individual prey within prey types if model assumptions are met. Within the framework of simulating predator signatures, variation is maximized when bootstrap sample sizes of the prey signatures used to construct a predator signature are minimized (Bromaghin 2015; Bromaghin et al. 2016b).

Implementing the above logic, gof() randomly samples a single signature from each prey type and weights those signatures by a predator’s estimated diet composition to construct a modeled signature. The modeled signature is then used to estimate diet. If the optimization function converges, the value of the minimized distance measure obtained with the modeled signature is compared to the value of the minimized distance measure obtained while estimating diet with the observed signature. This process is repeated a user-specified number of times (argument boot_gof) and the proportion of the simulated distance measure values that exceed the observed value of the distance measure is computed. gof() therefore constructs a statistic similar to a p-value, with small values being suggestive of predator signatures that may not have been closely approximated during diet estimation.

gof() is designed to be called with objects returned by est_diet() and prep_sig(), as well as associated constants; please refer to the function documentation for details. An example call to gof() is contained in the following code snippet.

# Check goodness-of-fit diagnostics.
gof_diag <- gof(prey_sigs = diet_est$prey_sigs,
                prey_loc = prey_info$loc,
                mean_sigs = diet_est$mean_sigs,
                diet_est = diet_est$est_ind,
                conv = diet_est$conv,
                obj_func = diet_est$obj_func,
                dist_meas = my_dist_meas,
                gamma = my_gamma,
                boot_gof = boot_gof)
str(gof_diag)
hist(gof_diag$p_val)

gof() returns a list containing the following objects:

gof_ss - The number of diet estimates that converged for each predator, therefore producing a simulated value of the distance measure.
p_val - The proportion of the simulated values of the distance measure that exceeded the value produced during diet estimation.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

NOTE: this algorithm implements an idea whose performance has not been explored. It has been included in qfasar to support future research on this topic. Also, like diet estimation itself, this function involves considerable bootstrap resampling and numerical optimization, which can result in very long run times.

Simulation functionality

One of the primary motivations for the development of qfasar was to facilitate future research into potentially improved methods, work that is normally accomplished using computer simulation. The existence of a “toolbox” of ready-to-use simulation functions was expected to make future research easier and faster to complete.

The simulation capabilities of qfasar are summarized in the following subsections.

Predator diet composition
Simulating predator signatures
Generating a ghost prey signature
Adding error to calibration coefficients

Predator diet composition

A majority of simulations designed to investigate the performance of QFASA will probably involve some method of establishing predators with known diet compositions, simulating predator signatures based on those known diets (see Simulating predator signatures), and then evaluating the performance of diet estimators relative to the known “true diet”. Potential methods of establishing known diets, and their implementation in qfasar, are described in the following subsections:

Realistic diets
Random diets
Diet grid

Realistic diets

In some circumstances, one might might wish to work with a small number of “realistic” diets (e.g., Bromaghin et al. 2015). Realistic diets might be taken from the literature, developed using expert opinion, based on biological hypotheses, or constructed to test certain features of a prey library.

qfasar does not implement any special methods to facilitate use of realistic diets. They could be manually entered using the base function concatenate c() or read from a file. In either case, diet proportions should be stored in a matrix in column-major format, i.e., one set of diet proportions per column, to be consistent with qfasar design.

NOTE: it is critical that the order of diet proportions be identical to the alphanumeric order of the prey types (check the object uniq_types returned by prep_sig() Preparing signature data or by make_prey_part() Using a prey partition to verify the order). If the diets are not entered or read in that order, they should be sorted using the base function order() prior to calling qfasar functions.

Given one or more diet compositions, predator fatty acid signatures can be generated using the function make_pred_sigs() (Simulating predator signatures). The diets of such simulated predators can then be estimated (Diet estimation), and the resulting estimates can be compared to the known diet composition to evaluate bias, variance, or other properties.

As an example, the realistic diets of adult male and female Chukchi Sea polar bears used by Bromaghin et al. (2015) could be loaded into memory as follows.

# Enter data.
diets <- matrix(c(0.065, 0.005, 0.051, 0.000, 0.873, 0.000, 0.006,
                  0.207, 0.014, 0.077, 0.000, 0.658, 0.000, 0.044),
                  ncol = 2)
rownames(diets) <- c("Bearded", "Beluga", "Bowhead", "Ribbon", "Ringed", "Spotted", "Walrus")
colnames(diets) <- c("AF", "AM")

# Check results.
print(diets)
colSums(diets)

Random diets

An alternative to working with a small number of realistic diets (Realistic diets) is to work with random diets. This might be useful if there is truly no information on which to base realistic diets or if one wishes to explore the performance of a prey library under a wide range of possible diets (e.g., Bromaghin et al. 2015). One advantage of working with diets distributed throughout the space of all possible diets is that it can reveal broad scale patterns in the performance of diet estimators (e.g., Bromaghin et al. 2016b).

qfasar has the capability of generating diet compositions randomly throughout the space of all possible diets for a given collection of prey types, implemented in the function make_diet_rand(). This function takes a factor of the unique prey-type names, intended to be the object uniq_types returned by a call to the function prep_sig() with a prey data frame (Preparing signature data) or a call to make_prey_part() (Using a prey partition), and the number of diet compositions to generate as input.

An example call to make_diet_rand() is contained in the following code snippet.

# Generate some random diets.
diet_rand <- make_diet_rand(uniq_types = prey_info$uniq_types,
                            n_diet = 100000)
str(diet_rand)

make_diet_rand() returns a list containing the following objects.

diet_rand - A numeric matrix of random diet compositions, in column-major format.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Diet grid

An alternative to working with a small number of realistic diets (Realistic diets) is to work with a systematic grid of regularly-spaced diet compositions. A diet grid is similar to random diets (Random diets), but diet compositions are evenly, rather than randomly, distributed throughout the space of all possible diets. This method of generating diet compositions might be useful if there is truly no information on which to base realistic diets or if one wishes to explore the performance of a prey library under a wide range of possible diets (e.g., Bromaghin et al. 2015). One advantage of working with diets distributed throughout the space of all possible diets is that it can reveal broad scale patterns in the performance of diet estimators (e.g., Bromaghin et al. 2016b).

qfasar has the capability of generating a grid of diet compositions evenly distributed throughout the space of all possible diets for a given collection of prey types, implemented in the function make_diet_grid(). This function takes a factor of the unique prey-type names, intended to be the object uniq_types returned by a call to the function prep_sig() with a prey data frame (Preparing signature data) or make_prey_part() (**Using a prey partition), and the inverse of the desired spacing between diet proportions as input. For example, an inverse increment of 10 would produce diet compositions with proportions shifted by an increment of 0.1. A small example of a diet grid with three prey types and a diet increment of 0.25 (integer inverse 4), modified from Bromaghin et al (2016b), follows.

Diet	Prey 1	Prey 2	Prey 3
1	1.00	0.00	0.00
2	0.75	0.25	0.00
3	0.75	0.00	0.25
4	0.50	0.50	0.00
5	0.50	0.25	0.25
6	0.50	0.00	0.50
7	0.25	0.75	0.00
8	0.25	0.50	0.25
9	0.25	0.25	0.50
10	0.25	0.00	0.75
11	0.00	1.00	0.00
12	0.00	0.75	0.25
13	0.00	0.50	0.50
14	0.00	0.25	0.75
15	0.00	0.00	1.00

NOTE: The number of possible diets grows extremely quickly as the number of prey types increases and the diet increment decreases, and internal memory limits may be exceeded.

An example call to make_diet_grid() is contained in the following code snippet.

# Generate a diet grid.
diet_grid <- make_diet_grid(uniq_types = prey_info$uniq_types,
                            inv_inc = 8)
str(diet_grid)

make_diet_grid() returns a list containing the following objects.

diet_grid - A numeric matrix of gridded diet compositions, in column-major format.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

Simulating predator signatures

Investigations to study performance characteristics of QFASA may require predator signatures to be simulated so that thay have known properties, such as a known diet composition. Such simulated predator signatures have been called “pseudo-predators” (Iverson et al. 2004). The diet compositions of the simulated predators are then estimated, allowing the observed characteristics of the estimates to be compared with the known characteristics of the specified diet (e.g., Bromaghin et al. 2016a).

Predator signatures are simulated using a procedure that that closely parallels diet estimation. A diet composition must first be specified (see Predator diet composition). Conditioned on the specified diet, a bootstrap sample of signatures from each prey type is drawn (sampling with replacement) and the mean signature of each prey type is computed. A predator signature is constructed as a linear mixture of the mean prey signatures, with the diet composition as the mixture proportions.

An important question in simulating predator signatures is the number of signatures to bootstrap from each prey type. In the literature, the choice of bootstrap sample sizes has been made subjectively. As examples, Iverson et al. (2004) specified three levels of prey sample size (30, 60, and 90) and Bromaghin et al. (2015) used bootstrap sample sizes equal to the sample sizes in the prey library. Some authors apparently used subjectively-determined sample sizes but did not report the values (e.g., Thiemann et al. 2008; Wang et al. 2010; Haynes et al. 2015). However, Bromaghin et al. (2016b) found that bootstrap sample sizes strongly influence both the bias and variance of diet composition estimates. Consequently, the selection of bootstrap sample sizes is an important aspect of simulation design. Using sample sizes that are too small or too large may lead to an overly pessimistic or optimistic assessment, respectively, of model capabilities.

Bromaghin (2015) presented an objective method of establishing a bootstrap sample size for each prey type that produces simulated predator signatures having a realistic level of between-signature variation. That algorithm and the function implementing it are briefly described in a subsequent subsection (Bootstrap sample size algorithm).

qfasar implements the simulation of predator signatures in the function make_pred_sigs(). Predator signatures are first generated in the prey space and then transformed to their natural predator space. make_pred_sigs() is called using the following arguments:

prey_sigs - A matrix of prey signatures in the prey space, intended to be the object sig_scale returned by a call to the function prep_sig(), or the object sig_part returned by a call to the function make_prey_part().
prey_loc - A matrix giving the first and last locations of the signatures of each prey type within prey_sigs, intended to be the object loc returned by a call to one of the functions prep_fa() or make_prey_part().
cc - A numeric vector containing the calibration coefficients, intended to be the object cc returned by a call to one of the functions prep_fa() or cc_aug().
diet - A numeric vector specifying the predator diet composition as proportions.
prey_ss - An integer vector specifying the bootstrap sample size to use for each prey type.
n_pred - An integer specifying the number of predator signatures to generate.

make_pred_sigs() returns a list with the following components:

sim_pred_sigs - A numeric matrix containing simulated predator signatures in the predator space.
err_code - An integer error code (0 if no error is detected).
err_message - A string containing a brief summary of the results.

Please see the code block in the section Bootstrap sample size algorithm for an example call to make_pred_sigs().

Bootstrap sample size algorithm

Bromaghin (2015) presented an objective method of establishing a bootstrap sample size for each prey type that produces simulated predator signatures having a realistic level of between-signature variation. The algorithm, implemented in the function find_boot_ss(), is based on the conceptualization of variation in predator signatures being partitioned into two sources, variation in diet between predators and variation in prey consumed given a common diet. Constructing predator signatures by bootstrapping prey signatures mimics the selection of prey by predators, so the objective is to find the level of variation in predator diets that is attributable only to the selection of prey.

To derive a target level of variation in predator signatures, conditioned on diet, a measure of variation between the estimated diets of all possible pairs of predators is computed. Similarly, a measure of variation between the signatures of all possible pairs of predators is computed. A loess smooth is used to approximate the empirical relationship between these two measures of variance. Conceptually, as the variation between diet estimates approaches zero, the corresponding level of variation between signatures approaches the variation attributable to prey selection given a common diet. In practice, find_boot_ss() finds the pair of predators with the smallest variation between their diets and takes the value of variation from the loess smooth for that pair as the target level of variation.

Given the target level of variation, the algorithm is initialized with a sample size of 1 from each prey type. A sample of predator signatures is generated using those sample sizes and the measure of variance among the signatures is computed. If the variance measure exceeds the target level, the prey type contributing most to the variance measure is identified and its sample size is increased by 1. The algorithm then iterates, increasing the sample size by one at each iteration, until the measure of variation drops below the target level for the first time. Please refer to Bromaghin (2015) for additional details.

find_boot_ss() is designed to take objects returned by est_diet() (see Diet estimation) as input. est_diet() transforms signatures to a common estimation space, as selected by the user, so find_boot_ss() operates in the estimation space used to estimate predator diet composition. Similarly, if diets were estimated with a partitioned prey library, the results of find_boot_ss() will be applicable to the prey types in the partitioned library.

Example calls to find_boot_ss() and make_pred_sigs() are contained in the following code snippet.

# Find bootstrap sample sizes suitable for simulation work.
boot_ss <- find_boot_ss(pred_sigs = diet_est$pred_sigs,
                        pred_diets = diet_est$est_ind,
                        prey_sigs = diet_est$prey_sigs,
                        prey_loc = prey_info$loc,
                        n_pred_boot = 500)
str(boot_ss)
print(boot_ss$boot_ss)

# Simulate predator signatures.
my_sim_preds <- make_pred_sigs(prey_sigs = prey_info$sig_scale,
                               prey_loc = prey_info$loc,
                               cc = new_cc$cc,
                               diet = apply(X = diet_est$est_ind, MARGIN = 1,
                                            FUN = mean),
                               prey_ss = boot_ss$boot_ss,
                               n_pred = 1000)
str(my_sim_preds)

NOTE: the argument n_pred_boot controls the number of predator signatures to simulate. The measure of variance over the simulated predators is compared to the target level of variation. Consequently, the value of n_pred_boot should be large enough to return an estimate of the variance measure that itself has low variance, so that the algorithm returns a numerically stable estimate of suitable sample sizes over repeated function calls. I suspect that the default value of 1000 errs on the side of caution, but this has not been tested.

find_boot_ss() returns a list with the following components:

var_diet - A numeric vector of the measure of variance between the estimated diets of all possible pairs of predators, sorted in increasing order.
var_sig - A numeric vector of the measure of variance between the signatures of all possible pairs of predators, sorted consistently with var_diet.
var_sig_smooth - A loess-smoothed version of var_sig.
mod_sig_var - A numeric vector of the modeled variance between bootstrapped predator signatures at each iteration of the algorithm.
var_target - The target level of variance between bootstrapped predator signatures.
boot_ss - An integer vector of bootstrap sample sizes for each prey type.
err_code - An integer error code (0 if no error is detected).
err_message - A string containing a brief summary of the results.

The most important object returned by find_boot_ss() is the vector of sample sizes for the prey types (boot_ss). However, the other objects may also be of some interest. For example, they can be used to create plots similar to those of Bromaghin (2015).

# Figure 1 of Bromaghin (2015).
plot(x = boot_algo$var_diet, y = boot_algo$var_sig, cex.lab=1.5,
     xlab="Variation in predator diets",
     ylab="Variation in predator signatures")
lines(x = boot_algo$var_diet, y = boot_algo$var_sig_smooth, lwd=3, col="red")


# Figure 2 of Bromaghin (2015).
iteration <- 1:length(boot_algo$mod_sig_var)
plot(x = iteration, y = boot_algo$mod_sig_var,
     pch = 16, cex.lab = 1.5, type = "b",
     xlab = "Iteration",
     ylab = "Modeled predator signature variance")
abline(boot_algo$var_target, 0)

Generating a ghost prey signature

One of the major assumptions of QFASA is that the prey library contains representatives of all prey types consumed by a predator. Bromaghin et al. (2016a) investigated the robustness of diet estimators to violations of this assumption. qfasar implements their method of constructing a signature for a prey type not in the library, termed a ghost prey, in the function make_ghost().

The ghost prey signature is constructed by maximizing the summed distance between the ghost prey signature and the mean prey signatures, while constraining the ghost signature proportions within reasonable bounds to ensure that the signature is somewhat realistic for the prey library. The definition of reasonable bounds is embodied in the argument ghost_err. ghost_err is a proportion greater than or equal to zero and less than 1 used to construct box constraints for each fatty acid in the signatures.

lower bound equals (1 - ghost_err)*(smallest mean prey proportion)
upper bound equals (1 + ghost_err)*(largest mean prey proportion)

The ghost prey signature is obtained by maximizing the summed distance between the signature and the mean prey signatures, constraining the signature to lie within the box constraints and sum to 1.

This method ensures that the ghost prey signature is somewhat distinct from the other prey types, but not so wildly different that it represents a completely different pattern from the other prey types. Although research into suitable values for ghost_err has not been conducted, it is probably advisable to use small to moderate values. Bromaghin et al. (2016) used a value of 0.25. As the value of ghost_err is increased, the resulting ghost prey signature will tend to become increasing different from any prey type in the library. If the value of ghost_err is too small, the ghost prey signature could potentially be similar to one or more prey types in the library, which is probably not what one would want for a simulation study.

The following code snippet contains an example call to make_ghost().

# Specify a level of error.
ghost_err <- 0.25

# Construct a ghost prey signature.
ghost <- make_ghost(prey_sigs = prey_info$sig_scale,
                    loc = prey_info$loc,
                    ghost_err = ghost_err,
                    dist_meas = 1,
                    gamma = NA)
str(ghost)

The function make_ghost() returns a list containing the following objects.

sig - A numeric vector containing the ghost prey signature.
dist - The summed distance between the ghost signature and the mean prey signatures.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

The following code snippet illustrates how to simulate predator signatures incorporating a specified contribution of ghost prey to their diets.

## Generate predator signatures with a contribution from the ghost prey.

# Find bootstrap sample sizes suitable for simulation work.
# There is only one ghost signature, so its sample size should be 1.
boot_algo <- find_boot_ss(pred_sigs = diet_orig$pred_sigs,
                          pred_diets = diet_orig$est_ind,
                          prey_sigs = diet_orig$prey_sigs,
                          prey_loc = prey_info$loc,
                          n_pred_boot = 500)
ghost_ss <- c(boot_algo$boot_ss, 1)


# Append ghost signature to prey data.
ghost_sigs <- cbind(prey_info$sig_scale, ghost$sig)
colnames(ghost_sigs) <- c(colnames(prey_info$sig_scale), "ghost")

last_prey <- prey_info$loc[prey_info$n_types,2]+1
ghost_loc <- rbind(prey_info$loc, c(last_prey, last_prey))
rownames(ghost_loc) <- c(rownames(prey_info$loc), "ghost")
colnames(ghost_loc) <- c("first", "last")


# Generate a random diet for the non-ghost prey components.  These proportions
# could be established in whatever way makes sense in a particular application.
non_ghost <- make_diet_rand(uniq_types = prey_info$uniq_types, n_diet = 1)


# Specify ghost prey component and add to the non-ghost components.
ghost_prop <- 0.15
ghost_diet <- c(non_ghost$diet_rand*(1 - ghost_prop), ghost_prop)
print(ghost_diet)
sum(ghost_diet)


# Generate predator signatures.
ghost_pred_sigs <- make_pred_sigs(prey_sigs = ghost_sigs,
                                  prey_loc = ghost_loc,
                                  cc = new_cc$cc,
                                  diet = ghost_diet,
                                  prey_ss = ghost_ss,
                                  n_pred = 500)
str(ghost_pred_sigs)

Adding error to calibration coefficients

One of the major assumptions of QFASA is that the calibration coefficients are known perfectly. Bromaghin et al. (2016a) investigated the robustness of diet estimators to violations of this assumption. The function add_cc_err() uses the methods of Bromaghin et al. (2016a) to add error to a set of calibration coefficients.

The argument err_bound is used to compute box constraints for the calibration coefficients:

lower bound equals (1 - err_bound)*cc_true
upper bound equals (1 + err_bound)*cc_true

where the argument cc_true is the set of true calibration coefficients provided as input. A uniformly distributed random number is generated between the bounds for each calibration coefficient and the vector of coefficients is scaled so that their sum equals the sum of the true calibration coefficients. The mean relative absolute difference between the true and error-added calibration coefficients is computed as a measure of error for the entire vector (Bromaghin et al 2016a).

The influence of calibration coefficient error can be investigated by generating predator signatures with the true set of calibration coefficients and estimating diet with a set of coefficients incorporating error (Bromaghin et al. 2016a). An example call to add_cc_err() is contained in the following code block.

# Find bootstrap sample sizes suitable for simulation work.
boot_ss <- find_boot_ss(pred_sigs = diet_est$pred_sigs,
                        pred_diets = diet_est$est_ind,
                        prey_sigs = diet_est$prey_sigs,
                        prey_loc = prey_info$loc,
                        n_pred_boot = 500)
                        
                        
# Compute the mean diet to use in a simulation.
mean_diet <- apply(X = diet_est$est_ind, MARGIN = 1, FUN = mean)


# Simulate predator signatures using the true calibration coefficients.
my_sim_preds <- make_pred_sigs(prey_sigs = prey_info$sig_scale,
                               prey_loc = prey_info$loc,
                               cc = new_cc$cc,
                               diet = mean_diet,
                               prey_ss = boot_ss$boot_ss,
                               n_pred = 1000)


# Add error to the calibration coefficients.
cc_err <- add_cc_err(cc_true = new_cc$cc, err_bound = 0.25)
str(cc_err)
print(cc_err$err)


# Estimate diets of the simulated predators using the error-added calibration coefficients.
err_est <- est_diet(pred_sigs = pred_info$sig_scale,
                    pred_uniq_types = pred_info$uniq_types,
                    pred_loc = pred_info$loc,
                    prey_sigs = prey_info$sig_scale,
                    prey_uniq_types = prey_info$uniq_types,
                    prey_loc = prey_info$loc,
                    cc = cc_err$cc,
                    space = my_space,
                    dist_meas = my_dist_meas,
                    gamma = my_gamma,
                    ind_boot = my_ind_boot,
                    mean_meth = my_mean_meth,
                    var_meth = my_var_meth,
                    mean_boot = my_mean_boot)

The function add_cc_err() returns a list containing the following objects.

cc - A numeric vector of calibration coefficients with error incorporated.
err - The mean relative absolute error in the calibration coefficients.
err_code - An integer error code (0 if no error is detected).
err_message - A string contains a brief summary of the execution.

References

Aitchison, J., C. Barcelo-Vidal, J.A. Martin-Fernandez, and V. Pawlowsky-Glahn. 2000. Logratio analysis and compositional distance. Mathematical Geology 32:271-275.

Beck, C.A., S.J. Iverson, W.D. Bowen, and W. Blanchard. 2007. Sex differences in grey seal diet reflect seasonal variation in foraging behaviour and reproductive espenditure: evidence from quantitative fatty acid signature analysis. Journal of Animal Ecology 76:490-502.

Bromaghin, J.F. In Press. qfasar: quantitative fatty acid signature analysis in R.

Bromaghin, J.F. 2015. Simulating realistic predator signatures in quantitative fatty acid signature analysis. Ecological Informatics 30:68-71.

Bromaghin, J.F. 2008. BELS: Backward elimination locus selection for studies of mixture composition or individual assignment. Molecular Ecology Resources 8:568-571.

Bromaghin, J.F., S.M. Budge, and G.W. Thiemann. Under review. Detect and exploit hidden structure in fatty acid signature data.

Bromaghin, J.F., S.M. Budge, and G.W. Thiemann. 2016b. Should fatty acid signature proportions sum to 1 for diet estimation? Ecological Research 31:597-606.

Bromaghin, J.F., S.M. Budge, G.W. Thiemann, and K.D. Rode. 2016a. Assessing the robustness of quantitative fatty acid signature analysis to assumption violations. Methods in Ecology and Evolution 7:51-59.

Bromaghin, J.F., K.D. Rode, S.M. Budge, and G.W. Thiemann. 2015. Distance measures and optimization spaces in quantitative fatty acid signature analysis. Ecology and Evolution 5:1249-1262.

Bromaghin, J.F., M.M. Lance, E.W. Elliott, S.J. Jeffries, A. Acevedo-Gutierrez, and J.M. Kennish. 2013. New insights into the diets of harbor seals (Phoca vitulina) in the Salish Sea revealed by analysis of fatty acid signatures. Fishery Bulletin 111:13–26.

Budge, S.M., S.J. Iverson, and H.N. Koopman. 2006. Studying trophic ecology in marine ecosystems using fatty acids: a primer on analysis and interpretation. Marine Mammal Science 22:759–801.

Budge, S.M., S.J. Iverson, W.D. Bowen, and R.G. Ackman. 2002. Among- and within-species variability in fatty acid signatures of marine fish and invertebrates on the Scotian Shelf, Georges Bank, and southern Gulf of St. Lawrence. Canadian Journal of Fisheries and Aquatic Sciences 59:886-898.

Ghalanos, A., and S. Theussl. 2014. Rsolnp: general non-linear optimization using augmented Lagrange multiplier method. R package version 1:15.

Haynes, T.B., J.A. Schmutz, J.F. Bromaghin, S.J. Iverson, V.M. Padula, and A.E. Rosenberger. 2015. Diet of yellow-billed loons (Gavia adamsii) in Arctic lakes during the nesting season inferred from fatty acid analysis. Polar Biology 38:1239-1247.

Iverson, S.J., C. Field, W.D. Bowen, and W. Blanchard. 2004. Quantitative fatty acid signature analysis: A new method of estimating predator diets. Ecological Monographs 74:211-235.

Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. 2016. cluster: cluster analysis basics and extensions. R package version 2.0.4.

Martin-Fernandez, J.A., J. Palarea-Albaladejo, and R.A. Olea. 2011. Dealing with zeros. P. 43-58 in V. Pawlowsky-Glahn and A. Buccianto, eds. Compositional data analysis: theory and application. John Wiley, Chichester.

Meynier, L., P.C.H. Morel, B.L. Chilvers, D.D.S. Mackenzie, and P. Duignan. 2010. Quantitative fatty acid signature analysis on New Zealand sea lions: model sensitivity and diet estimates. Journal of Mammalogy 91:1484–1495.

Nordstrom, C.A., L.J. Wilson, S.J. Iverson, and D.J. Tollit. 2008. Evaluating quantitative fatty acid signature analysis (QFASA) using harbour seals Phoca vitulina richardsi in captive feeding studies. Marine Ecology Progress Series 360:245–263.

Rode, K.D., E.V. Regehr, D.C. Douglas, G. Durner, A.E. Derocher, G.W. Thiemann, and S.M. Budge. 2014. Variation in the response of an arctic top predator experiencing habitat loss: feeding and reproductive ecology of two polar bear populations. Global Change Biology 20:76–88.

Rosen, D.A.S. & D.J. Tollit. 2012. Effects of phylogeny and prey type on fatty acid calibration coefficients in three pinniped species: implications for the QFASA dietary quantification technique. Marine Ecology Progress Series 467:263–276.

Seber, G.A.F. 1982. The Estimation of Animal Abundance and Related Parameters, second edition. Macmillan Publishing Co., New York.

Stewart, C., and C. Field. 2011. Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics 16:45?69.

Stewart, C., S. Iverson, and C. Field. 2014. Testing for a change in diet using fatty acid signatures. Environmental and Ecological Statistics 21:775-792.

Thiemann, G.W., S.J. Iverson, and I. Stirling. 2008. Polar bear diets and Arctic marine food webs: insights from fatty acid analysis. Ecological Monographs 78:591-613.

Wang, S.W., T.E. Hollmen, and S.J. Iverson. 2010. Validating quantitative fatty acid signature analysis to estimate diets of spectacled and Steller’s eiders (Somateria fischeri and Polysticta stelleri). Journal of Comparative Physiology B 180:125–139.

- Introduction
- Getting started
  - Getting help within R
  - Distribution list
- Data frames
  - Prey signature data
  - Predator signature data
  - Fatty acid suite data
- Preparing for an analysis
  - Preparing fatty acid suite information
  - Preparing signature data
- Distance measures
  - Chi-square distance power parameter
- Estimation spaces
- Diet estimation
  - Estimating individual diet and variance
  - Estimating mean diet and variance
  - Adjusting for fat content
- Diagnostic functionality
  - Exploring signatures for structure
  - Leave-one-prey-out analysis
  - Signature & calibration coefficient consistency
  - Goodness of fit
- Simulation functionality
  - Predator diet composition
  - Simulating predator signatures
  - Generating a ghost prey signature
  - Adding error to calibration coefficients
- References

Package qfasar