Update your browser to view this website correctly. Update my browser now
A central object in community ecology is species abundance distribution. We are interested in the power law and its allies for ranked species abundance data. We collected 12 large data sets consisting of many samples. The preliminary fitting result makes a robust impression (12 systems at three scales of integration) that the stretched exponential is an interesting alternative for the power law. For further work, advanced statistics are required. Not only ‘our’ data but, quite often, other data as well consist of sample×species cross-tables. With cross-tables, also ‘within species over samples’ characteristics can be studied. An integrated view on data patterns in multi-sample sets may help to identify generative processes for and the formulation of a relatively simple model for species abundance data.
A central object in community ecology is species abundance distribution (SAD). It has been studied for over a century since Raunkiaer (in). The review of an expert group is a benchmark for properties and generative theory. The edited volume on biological diversity is, among others, an update. For applications of the SAD, like the measurement of biodiversity, we refer to.
The expert group described a law for species abundance data: “When plotted as a histogram of number (or percent) of species on the y-axis vs. abundance on an arithmetic x-axis, the classic hyperbolic, ‘lazy J-curve’ or ‘hollow curve’ is produced, indicating a few very abundant species and many rare species (Fig. 1A). In this form, the law appears to be universal; we know of no multispecies community, ranging from the marine benthos to the Amazonian rainforest, that violates it” (in our Supplement A, we make some remarks to the meaning of ‘community’ and ‘sample’). Several terms are used for a ‘hollow curve’: the distribution is (right)-skewed, long-tailed, has extreme values, shows rare events. Is there a simple meaningful equation for ‘the’ species abundance distribution? One is inclined to think so if there is a ‘universal law’. Review is steeped with the idea of a relatively simple equation for SADs, but it presents the opposite too, that different communities have rather different SADs and that groups of species within a community have different SADs, making the community’s SAD a mixed one. The SAD is mostly treated as a histogram, based on the binning of data into frequency classes (for a probability mass or density function). However, the SAD can be illustrated as a rank abundance or Whittaker plot (see, their Fig. 1c; see also). Ranked data are used for exploratory data analysis. Rank-size plots and (cumulative) probability plots are strongly related (and see our Supplement B). SADs bear similarity to distributions in other fields of science.
Long-tailed distributions of natural and manmade phenomena, in rank-size form (where ‘size’ can be read as ‘abundance’), often show power law behaviour (and see Wikipedia headword ‘Power law’). The ubiquitous power law has been considered for species abundance data. However, ideal power law behaviour is absent or rare: data points do not lie in a straight line in a log/log plot. For this reason, the interest in the power law for species abundance data seems to have vanished. However, the imperfect power law behaviour in other fields of science is well documented. Paper is of particular interest. It revisits the data and analysis in a seminal paper and concludes that 9 of 24 data sets conjectured to follow a power law actually do not.
The direct aim is to generate interest again in the power law, especially in its allies like the stretched exponential, for species abundance data. Further reaching aims, to be tackled in the future and for which we introduce a frame, are (i) to generate the interest of community ecologists in the generative processes of the power law and allies (that have been studied in other fields of science) and (ii) to complete the quest for a simple yet meaningful equation/model for the SAD.
Rank abundance plots were made, shown in figure 1. Data points in the log/log plots do not form straight lines, ideal power law behaviour is absent. The data points indicate curved lines, concave in almost all cases. As an alternative to the power law, the stretched exponential function was fitted. The parameter values of the fitted function, for the composite samples of the complete sets only, are given in our supplemental table B. No advanced statistics was applied. No comparison with other functions/models was made. The actual data show some deviations from the fitted curves, but the overall result is visually satisfying.
For further work, advanced statistics are required; we refer to. For model comparison and selection, we refer to. For allies of the power law, we refer to (especially chapter 4).
A challenge lies in a remark in the review of the expert group: “Starting in the 1970s and running unabated to the present day, mechanistic models (models attempting to explain the causes of the hollow curve SAD) and alternative interpretations and extensions of prior theories have proliferated to an extraordinary degree”. The power law and its allies are often considered for the degree or connectivity distribution of networks. Species abundance data are retrieved from ecological communities that are networks. However, a network topology behind species abundance data is not immediately clear. Species abundance data are reminiscent to data of food webs. For instance, the interactions between fruits and frugivorous birds can be presented in a cross-table of fruit species×bird species (data of in). From such a table, one can summarize the number of connections for the fruiting species with the bird species and vice versa: two connectivity distributions. The one dealing with the number of interactions for fruiting species over bird species is reminiscent of an assemblage of fruiting species, ‘sampled’ by birds. Networks can be generated by a process called preferential attachment (assortative mixing, assortativity (his Fig. 1, as well as a video in its Supplement S3), and see also and Wikipedia headword ‘Preferential attachment’). We suggest to link the quest for a simple distribution equation for the SAD with network research.
Not only ‘our’ data but, quite often, other data as well consist of sample×species cross-tables. Such tables provide for the opportunity to merge samples into a composite sample for a subset, or the whole set, as we did. Another opportunity is to study ‘within species over samples’ characteristics. We point to the abundance-occupancy relation, to Taylor’s law (fluctuation scaling) and to sampling theory. Hopefully, all patterns can be integrated and applied for analysis with resampling statistics (see also Wikipedia headwords ‘Nonparametric statistics’ and ‘Resampling (statistics)’) to obtain robust results, especially on the SAD.
Some of ‘our’ data sets provide for a time or a spatial series (fine scale: Mushrooms, Fish, Crustaceans, Fish+Crustaceans, Trees and Rodents). This makes them eligible to study the process of accumulation (‘sample’ growth, collectors curve, and species area relation (see also) and to look for autocorrelation). Seasonality aspects have already been described for the Mushrooms set, for Fish, and for Rodents and Annual plants. Spatial patterns for the Trees set have been described in .
Our result makes a robust impression (12 systems at three scales of integration) that the stretched exponential is a possible alternative for the power law. This result may stimulate others to pick up again the power law, its allies, and their generative properties for species abundance data.
Our statistics are traditional and limited in scope. Neither advanced goodness-of-fit testing is done nor statistical comparison with other models is made. We did not study, just considered, generative processes.
Ideal power law behaviour is absent in the data sets. Data points in log/log plots show curvature, concave in almost all cases. Fitted stretched exponentials meet this curvature. Advanced goodness-of-fit testing, model comparison/selection, and generative processes need to be done, expertise that we do not master and for which we seek collaboration.
12 data sets were studied. [I] A data set on Mushrooms, the property of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL, managed by Simon Egli. Data sets on [II] Fish and [III] Crustaceans, the property of Pisces Conservation Ltd., managed by Peter A. Henderson (see also). Fish and Crustaceans were enumerated from the same physical samples. We also studied the ‘whole’ samples: [IV] Fish+Crustaceans. We consider this an integration of 2 (sub) assemblages into a (new) assemblage (see our Supplement A). [V] A data set on tropical rainforest Trees from the Smithsonian Tropical Research Institute’s Center for Tropical Forest Science, managed by Condit et al. (see also). We also used data sets on four different desert assemblages of [VI] Rodents, [VII] Winter annuals, [VIII] Summer annuals and [IX] Ant colonies in the Chihuahuan desert, near Portal, Arizona. [X] A data set on weed Seedlings managed by the Centre for Ecology and Hydrology. [XI] A data set on Brachiopod fossils obtained from Thomas D. Olszewski. He re-enumerated material that had been deposited at the National Museum of Natural History, Washington DC. The material was sampled from Permian deposits spanning a period of approx. 10 Myr in a mountain range of approx. 40 km. The set of 187 samples was presented as consisting of 4 composite assemblages representing four geological formations. We consider the data as 1 composite set on our account. [XII] A data set on cow patty Flies. Characteristics of the data sets are given in the supplemental table. The sets, IV and XI excepted, were collected and studied previously for a characteristic of SADs as histograms, with data binned into frequency classes. Some additional information on the data sets can be found there.
Most sets have samples that were collected in different years (Mushrooms, Fish, Crustaceans, Rodents, Winter and Summer Annuals, Ants, Flies). Within-years sampling was done in different weeks (Mushrooms), in different months (Fish, Crustaceans), or at different locations (Rodents, Winter and Summer Annuals, Ants, Flies). Thus, samples can be assigned to subsets (terminology of set theory: the many samples are objects that form different subsets that form the set (Wikipedia headword ‘Set theory’)). In the other sets (Trees, Seedlings, Brachiopods) a similar structure can be applied. Within the subsets and the set, the samples can be merged, abundances adding up over species, forming composited ‘samples’. We studied (i) samples, (ii) composite samples of subsets and (iii) composite samples of sets, representing 3 scales of integration. Total abundance and species richness values, n and S, of samples and of composite samples of subsets were rank-transformed. The ranks over both parameters were averaged and their median was used to select ‘average’ samples among the primary and the composited samples of subsets, for the figure.
For the stretched exponential, we followed. The equation is y = (b+a×ln(x))^(1/c), with y for abundance and x for rank (rank 1 assigned to highest abundance value). It has three parameters: a, b, and c. The function can be rewritten to y^c = b+a×ln(x). This linear function can be used in simple fitting, using least squares. First, in an iterative process, the correlation between ln(x) and y^c is maximized by varying c, resulting in the best fitting value for c. Additionally, a linear regression is performed of y^c on ln(x), resulting in fitted values for a and b. For what they call the intuitive interpretation of the three parameters a, b and c, we refer to.