Man's impacts on global ecosystems are increasing and there is a growing demand that these activities be appropriately monitored. Monitoring requires measurement of a response metric (‘signal’) that changes maximally and consistently in response to the monitored activity irrespective of other factors (‘noise’), thus maximising the signal-to-noise ratio. Indices derived from time-consuming morphology-based taxonomic identification of organisms are a core part of many monitoring programmes. Metabarcoding is an alternative to morphology-based identification and involves the sequencing of short fragments of DNA (‘markers’) from multiple taxa simultaneously. DNA suitable for metabarcoding includes that extracted from environmental samples (eDNA). Metabarcoding outputs DNA sequences that can be identified (annotated) by matching them against archived annotated sequences. However, sequences from most organisms are not archived - preventing annotation and potentially limiting metabarcoding in monitoring applications. Consequently, there is growing interest in using unannotated sequences as response metrics in monitoring programmes.
We compared the sequences from three commonly used markers (16S (V3/V4 regions), 18S (V1/V2 regions) and COI) and, sampling along steep impact gradients, showed that the 16S and COI sequences were associated with the largest and smallest signal-to-noise ratio respectively. We trialled four separate, intuitive, noise-reduction approaches and demonstrated that removing less frequent sequences improved the signal-to-noise ratio, partitioning an additional 25 % from noise to explanatory factors in non-parametric ANOVA (NPA) and reducing dispersion in the data. For the 16S marker, retaining only the most frequently observed sequence, per sample, resulting in nine sequences across 150 samples, generated a near-maximal signal-to-noise ratio (95 % of the variance explained in NPA). We recommend that NPA, combined with rigorous elimination of less frequent sequences, be used to pre-filter sequences/taxa being used in monitoring applications. Our approach will simplify downstream analysis, for example the identification of key taxa and functional associations.