Research Article |
Corresponding author: Marlon E. Cobos ( manubio13@gmail.com ) Academic editor: Janet Franklin
© 2024 Marlon E. Cobos, Hannah L. Owens, Jorge Soberón, A. Townsend Peterson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Cobos ME, Owens HL, Soberón J, Peterson AT (2024) Detailed multivariate comparisons of environments with mobility oriented parity. Frontiers of Biogeography 17: e132916. https://doi.org/10.21425/fob.17.132916
|
Multivariate model projections onto conditions that differ from those under which the models were calibrated can result in uncertainty owing to response extrapolation. Therefore, interpretations of models transferred to conditions not analogous to those of model calibration must be done carefully to avoid misleading conclusions. A good practice when producing model transfers is to assess the degree of dissimilarity between calibration and transfer conditions. The mobility oriented parity (MOP) metric is a tool commonly used to quantify such dissimilarities. Our study elucidated the details of MOP, and expanded the interpretability of results when transfer conditions are not analogous to those of model calibration. We implemented a more detailed characterization of non-analogous conditions previously only identified as outside calibration ranges. In addition, we quantified the sensitivity of MOP to the density and position of reference conditions in environmental space, and explored the effect of sample size on MOP results. We show that detailed characterizations of non-analogous conditions can help to determine which variables are different between the two areas and how. This information can be used to understand causes of environmental dissimilarity and to identify variables that could be excluded from ecological niche modeling owing to extrapolation risks associated with them. Our results also show that dissimilarity values calculated in comparisons offer good indicators of how reliable interpretations can be in areas with conditions not analogous to those of reference. The tools developed and implemented in this study provide a useful, quantitatively efficient means of characterizing extrapolation uncertainty in model transfer studies, a best practice when using multivariate ecological niche models to make predictions about species’ distributions.
An extended version of the mobility oriented parity metric helps identify variables for which risks of model extrapolations can be expected.
Distinct MOP metrics used to characterize dissimilarity show comparable general patterns in transfer areas, despite differences in values.
As MOP results are influenced by background sampling size, the definition of this sample should be aimed to appropriately characterize calibration areas.
Calibration area, ecological niche modeling, extrapolation risks, model extrapolation, model transfer, MOP
The ability to transfer predictions of ecological models to scenarios distinct from that under which they were calibrated is a crucial feature that allows researchers to make inferences about a phenomenon under different circumstances (
To aid in interpreting model transfers, various tools have been developed to identify environmental dissimilarity between calibration areas and projections to other areas or time periods (e.g.,
To account for the irregularity of environmental spaces (see example, Fig.
A limitation of how MOP identifies environments outside reference conditions is that it does not identify which variables are outside such conditions. Non-analogous environmental conditions can be found for single or multiple variables in the same geographic area, yet MOP results only show that dissimilarity exists, not which variables are dissimilar. In addition, dissimilar environmental conditions can be at the high or low ends of a variable, information also missing from MOP outputs.
A further aspect in model transfer metrics that has yet to be assessed is the effect of point-density patterns in the cloud of reference environmental conditions (conditions for model calibration) on model transfers. As distances are measured from each point in the conditions of interest to a closest set of points in the reference conditions, density patterns (i.e., how tightly clustered points are in an environmental region) in reference conditions may affect the results obtained. That is, for high-density conditions of interest in the reference regions, lower dissimilarity values would be found than for those close to low-density reference regions, because points in high-density areas are closer to one other than those in low-density regions.
Another complication that may affect MOP is the number of background or pseudo-absence points used to characterize the study area (
MOP and other metrics of environmental similarity between conditions of interest and those of reference offer crucial information when interpreting the results of model transfers. If environments in a transfer region are similar to those of the reference region, interpretations of model predictions are straightforward. On the other hand, if the conditions are non-analogous, interpretation of prediction results should be done carefully. More details about what causes dissimilarity and what is its magnitude can help in delimiting areas where interpretations are safe, and maybe change the way decisions are made during the modeling process. Our general goal with this study was to improve and enrich the information provided by MOP, permitting more detailed interpretations of results. To this end, we (1) extended MOP to provide information on non-analogous variable identity and the nature of their extremity compared to the reference region, (2) explored sensitivity of MOP to parameter settings and differences in the distribution (point density patterns) of reference conditions, and (3) explored effects of the number of points sampled from the set of reference conditions (sample size) when comparing conditions of interest to those of reference (Fig.
We performed three computational challenges with MOP, described below, using data for two geographic scenarios: (1) a calibration area represented by the Lower 48 United States (hereafter “USA”) and a broad transfer area within the extent (60°S–80°N, 135–35°W), which covers most of the Americas (hereafter “the Americas”), and (2) an area for model calibration for the bird Transvolcanic Jay (Aphelocoma ultramarina; hereafter “jay accessible area”), obtained via dispersal simulations by
Environmental conditions comprised bioclimatic variables from the WorldClim database (v2, www.worldclim.org;
In our first challenge (Fig.
We designed our second exploration to illustrate the effects of reference condition density patterns in environmental space on MOP results (Fig.
Representation of environmental spaces of the example cases in two dimensions. Reference conditions (conditions in calibration areas) and conditions of interest (conditions in transfer areas) are represented as points and density kernels. Regions of the environmental space of relevance for analysis are highlighted and described in the conditions of interest of one of the examples. USA = Lower 48 United States of America.
Dissimilarity measurements are influenced by the method used to calculate distances as well as by the way data is transformed before performing analysis (
Finally, we tested the effect of sample size (number of points used to represent the reference conditions in the calibration area) on MOP results using four different samples: an initial analysis with the entire set of calibration area data (40,207 cells), and three subsequent subsets (20,000, 10,000, and 5000 cells, respectively). The size of the sample of reference conditions can determine whether rare environmental combinations are included or not in analysis, not only for MOP but also for ENM applications. Therefore, MOP and ENM results can vary depending on this factor. We tested distinct sample sizes to identify if this parameter can alter results and interpretations of MOP outputs. Analysis times on a laptop computer (Intel Xeon, 2.8 GHz, 8 processors, 16 GB of RAM; 4 processors in parallel) were recorded to serve as a benchmark for analysis optimization. We ran MOP analyses using Euclidean distances for the example case that compares the USA and the Americas, using the set of six variables used in the first challenge, scaled and centered before analysis. Distances were measured in reference to the closest 10% of the conditions in the calibration area.
In challenge one, we attempted to detect regions of the transfer areas where risks from model extrapolation are higher based on more detailed characterizations of non-analogous conditions (Fig.
In challenge two, we explored the effects of parameter settings and density patterns in reference conditions on dissimilarity results. MOP dissimilarity values (distances) obtained for the conditions of interest (Figs
Reducing the percentage of calibration points used as a reference to measure distances decreased dissimilarity in projection environments inside reference ranges (Figs
In challenge three, we assessed the effect of calibration area sample size on MOP results. The most noticeable effect of background sample size was detected in the areas identified as non-analogous (Fig.
Re-scaled dissimilarity (distance) results from the mobility oriented parity (MOP) metric for the example case comparing the United States and the Americas using two variables. MOP analyses were performed using Euclidean or Mahalanobis distances, raw or scaled variables, and with reference to the closest 10, 5, or 1% of environmental conditions in the calibration area. Environmental regions of interest (blocks) are highlighted in red. Color palette is shared among all panels in the figure.
Dissimilarity values (distances, rep-scaled 0–1) for regions of interest in environmental space (blocks), obtained from the mobility oriented parity metric (MOP). Analyses were performed using two variables, Euclidean or Mahalanobis distances, raw or scaled variables, and with reference to the closest 10, 5, or 1% of environmental conditions in the calibration area.
Results from the mobility oriented parity metric performed with samples of the calibration area of distinct sizes (40,207; 20,000; 10,000; and 5,000). Analyses were done for the example that compares the Lower 48 United States and the Americas, using 10% of the reference conditions to get mean distances.
Understanding how a model extrapolates under non-analogous conditions can be challenging owing to complications involved in multivariate predictions and the combination of novel conditions in the transfer scenarios (
The information produced with the updated MOP metric in regards to which variables, and the end of such variables (low or high) at which non-analogous conditions are observed can be used to determine whether one or more of these variables should be included in ENMs. Although the decision of which variables to use should be based mainly on their relevance in modeling the phenomenon of interest, MOP results can help to vote in favor of one or another predictor in cases in which they have similar implications in the model and the information provided is comparable. For instance, MOP results showing that a temperature variable presents large areas with non-analogous conditions in a transfer scenario for the future is a first indication that extrapolation risks can be expected (e.g., bio6 in Fig.
All parameters of the MOP analysis explored in our second challenge (distance type, percentage of reference, and variable scaling) had effects on MOP distances. Perhaps the most notable effect on the pattern of distance values derived from using Mahalanobis distances, with shorter values for conditions of interest around the reference conditions compared to results of analyses using Euclidean distances (only visible when distances were re-scaled 0–1; Fig.
The density of calibration points used to characterize environmental conditions, which indicates how common certain conditions are in the reference area, had clear effects in our results from challenge two (Fig.
Our results from challenge three showed that sample size has an effect on how non-analogous conditions are identified. As the details related to which variable showed non-analogous conditions, the end of the variable at which such conditions are found, and the extension of the area with these conditions are relevant for interpretations and to making decisions for modeling, definition of appropriate sizes to sample backgrounds is not a trivial matter. In our example, the largest effect on the patterns of novel conditions was found when the sample size was ~10% of the total points (5000). Some ENM algorithms and routines allow users to include raster layers from which background for modeling is sampled. As the sample size used to generate background is usually smaller than the total number of pixels in such layers, only this sampled background (including occurrences) should be used as reference conditions in analysis of extrapolation risks. The sample size needed to create appropriate models and perform MOP analyses could depend on multiple factors; for instance, if extreme conditions in the area of interest are rare, a larger random sample size may be required to represent such rare conditions in analyses (
In conclusion, this study addressed the intricacies of model extrapolation under non-analogous conditions, enhancing the interpretability and utility of the MOP metric. We have expanded the metric to characterize non-analogous conditions better, and explored the impact of calculation parameters, which has significantly improved its interpretability. We emphasize the importance of MOP results in assessing risks of spurious extrapolations in ENMs, and perhaps in guiding variable inclusion in these models. The interplay between MOP outcomes and ENM response curves is crucial to more robust assessments of extrapolation risk. Our findings indicate that reference conditions density affects MOP results, emphasizing the need for careful explorations of environmental conditions by researchers. Our research also highlights the role of sample size in identifying non-analogous conditions, underscoring the importance of selecting an appropriate sample size for background characterization. These findings contribute to refining ecological niche modeling methods by informing about model extrapolation under non-analogous conditions.
We thank the members of the KUENM working group in the University of Kansas Biodiversity Institute, for their thinking and work on these topics over the years.
MEC, HLO, JS, and ATP conceived the idea of the project. MEC ran analyses. MEC wrote the manuscript. MEC, HLO, JS, and ATP reviewed and improved the manuscript.
All data used to produce the examples presented here can be obtained with the code provided as part of the example application. R code to reproduce all analyses and figures is available from a GitHub repository (https://github.com/marlonecobos/new_MOP). All analyses required to produce the extended mobility-oriented parity metric were implemented in the new R package “mop” (available at https://CRAN.R-project.org/package=mop).