There are less and less experts on taxonomy of particular groups particularly among early career paleontologists and (paleo)biologists – this also includes ammonoid cephalopods. Techniques cannot replace this taxonomic expertise (Engel et al. 2021) but machine learning approaches can make taxonomy more efficient, reproducible as well as passing it over more sustainable. Initially ammonoid taxonomy was a black box with small differences sometimes sufficient to erect different species as well as really idiosyncratic groupings of superficially similar specimens (see De Baets et al. 2015 for a review). In the meantime, scientists have embraced more quantitative assessments of conch shape and morphology more generally (see Klug et al. 2015 for a more recent review). The approaches still rely on important but time-intensive collection work and seeing through daisy chains of more or less accessible papers and monographs without really knowing how these approaches perform (other than expert opinion). In addition, younger scientists are usually trained by more experienced scientists, but this practice is becoming more and more difficult which makes it difficult to resolve the taxonomic gap. This relates to the fact that less and less experienced researchers with this kind of expertise get employed as well as graduate students or postdocs choosing different research or job avenues after their initial training effectively leading to a leaky pipeline and taxonomic impediment.
Robust taxonomy and stratigraphy is the basis for all other studies we do as paleontologists/paleobiologists so Foxon (2021) represents the first step to use supervised and unsupervised machine-learning approaches and test their efficiency on ammonoid conch properties. This pilot study demonstrates that machine learning approaches can be reasonably accurate (60-70%) in identifying ammonoid species (Foxon, 2021) – at least similar to that in other mollusk taxa (e.g., Klinkenbuß et al. 2020) - and might also be interesting to assist in cases where more traditional methods are not feasible. Novel approaches might even allow to further approve the accuracy as has been demonstrated for other research objects like pollen (Romero et al. 2020). Further applying of machine learning approaches on larger datasets and additional morphological features (e.g., suture line) are now necessary in order to test and improve the robustness of these approaches for ammonoids as well as test their performance more broadly within paleontology.
De Baets K, Bert D, Hoffmann R, Monnet C, Yacobucci M, and Klug C (2015). Ammonoid intraspecific variability. In: Ammonoid Paleobiology: From anatomy to ecology. Ed. by Klug C, Korn D, De Baets K, Kruta I, and Mapes R. Vol. 43. Topics in Geobiology. Dordrecht: Springer, pp. 359–426.
Engel MS, Ceríaco LMP, Daniel GM, Dellapé PM, Löbl I, Marinov M, Reis RE, Young MT, Dubois A, Agarwal I, Lehmann A. P, Alvarado M, Alvarez N, Andreone F, Araujo-Vieira K, Ascher JS, Baêta D, Baldo D, Bandeira SA, Barden P, Barrasso DA, Bendifallah L, Bockmann FA, Böhme W, Borkent A, Brandão CRF, Busack SD, Bybee SM, Channing A, Chatzimanolis S, Christenhusz MJM, Crisci JV, D’elía G, Da Costa LM, Davis SR, De Lucena CAS, Deuve T, Fernandes Elizalde S, Faivovich J, Farooq H, Ferguson AW, Gippoliti S, Gonçalves FMP, Gonzalez VH, Greenbaum E, Hinojosa-Díaz IA, Ineich I, Jiang J, Kahono S, Kury AB, Lucinda PHF, Lynch JD, Malécot V, Marques MP, Marris JWM, Mckellar RC, Mendes LF, Nihei SS, Nishikawa K, Ohler A, Orrico VGD, Ota H, Paiva J, Parrinha D, Pauwels OSG, Pereyra MO, Pestana LB, Pinheiro PDP, Prendini L, Prokop J, Rasmussen C, Rödel MO, Rodrigues MT, Rodríguez SM, Salatnaya H, Sampaio Í, Sánchez-García A, Shebl MA, Santos BS, Solórzano-Kraemer MM, Sousa ACA, Stoev P, Teta P, Trape JF, Dos Santos CVD, Vasudevan K, Vink CJ, Vogel G, Wagner P, Wappler T, Ware JL, Wedmann S, and Zacharie CK (2021). The taxonomic impediment: a shortage of taxonomists, not the lack of technical approaches. Zoological Journal of the Linnean Society 193, 381–387. doi: 10. 1093/zoolinnean/zlab072
Foxon F (2021). Ammonoid taxonomy with supervised and unsupervised machine learning algorithms. PaleorXiv ewkx9, ver. 3, peer-reviewed by PCI Paleo. doi: 10.31233/osf.io/ewkx9
Klinkenbuß D, Metz O, Reichert J, Hauffe T, Neubauer TA, Wesselingh FP, and Wilke T (2020). Performance of 3D morphological methods in the machine learning assisted classification of closely related fossil bivalve species of the genus Dreissena. Malacologia 63, 95. doi: 10.4002/040.063.0109
Klug C, Korn D, Landman NH, Tanabe K, De Baets K, and Naglik C (2015). Ammonoid conchs. In: Ammonoid Paleobiology: From anatomy to ecology. Ed. by Klug C, Korn D, De Baets K, Kruta I, and Mapes RH. Vol. 43. Dordrecht: Springer, pp. 3–24.
Romero IC, Kong S, Fowlkes CC, Jaramillo C, Urban MA, Oboh-Ikuenobe F, D’Apolito C, and Punyasena SW (2020). Improving the taxonomy of fossil pollen using convolutional neural networks and superresolution microscopy. Proceedings of the National Academy of Sciences 117, 28496–28505. doi: 10.1073/pnas.2007324117
DOI or URL of the preprint: https://doi.org/10.31233/osf.io/ewkx9
The manuscript documents an interesting pilot study applying established machine learning approaches to classify ammonoids based on standard conch properties. I feel it is an important way forward to standardize and test ammonoid taxonomy, but some minor but crucial points need to be revised before I can officially recommend this manuscript.
The main points:
Focus on conch parameters: It is ok to focus on conch parameters as these are easier to get and analyze in a biologically meaningful way but a bit more discussion on why this is the case as well as how adding additional parameters (e.g., suture line, ornamentation such as ribbing) might improve statistical power to separate species would be crucial to discuss. Many species are not just defined by conch parameters so it would be crucial to point out that you are working with only a subset of characters used to define species which are more readily available in the literature and easier to analyze quantitatively (see also comments by reviewer 1).
Data: As pointed out in the manuscript, the dataset is limited to 11 species entered into the Paleobiology Database – the sample size of individual species are ok (> 50 – could still be better – some authors have suggested to have > 100 specimens available when including multiple ontogenetic stages, etc.). As an ammonoid worker which has worked on intraspecific variation – I can highlight that data for much more species would be available in the primary literature (a substantial part is still missing from the PDBD). I must admit that particularly in older literature measurements would need to be extracted from graphs and we still need to go some way before all paleontologists make this kind of data available as standard practice. Ideally, you should try to compile some additional data from the primary data to better understand what I mean and would help to broaden the scope of your analysis. As this is a pilot study, focusing on 11 species with samples > 50 could still ok, but it would be crucial to highlight which primary references yielded data for particular species. This also becomes crucial as for some species, data from multiple references are merged, presented data from multiple stratigraphic and geographic intervals (and likely also different degrees of preservation). This could for example explain the poorer performance for particular species like Owenites koeneni which derive from different localities and might also represent different preservations and ages. Please also, write species in italics as this is customary.
Performance of particular methods and species. The original authors might have assigned all their specimens to a particular species (e.g., Owenites koeneni) but mostly did not statistically evaluate how the conch parameters of their specimens compared with those of other localities and some even highlight qualitative differences with material from other localities. The homogeneity of conch parameters and there use to define species might therefore be to some degree compromised even before applying machine learning approaches. To place the performance of the methods into context for particular species, it would be crucial to add at least the primary reference providing data, their age range (single bed, biozone, etc.) as well the geographic scope (same locality, continent, etc.), so such potential issues could be glanced more transparently. In the discussion you focus on the performance of methods, but I would also be crucial to highlights which species are consistently picked up and which ones are not to better understand the impact of the issue of species definition. Which ones are often/sometimes merged and which ones are sometimes/often oversplit. This would allow a better discussion and understanding of how species definition and homogeneity of conch parameters might impact on the performance of the methods. At first glance, particularly Owenites koeneni seems to perform peculiarly and it is also one of the species which measurements deriving from several continents and publications. So it would be crucial to discuss this at greater length in the discussion
Code availability and reproducibility: It has been become standard practice to share the code at least upon publication (see Reviewer 2). Ideally, this should even be done during the review process as it would allow reviewers to verify the results, but I can to some degree understand the reluctance to do so before publication. Special repositories are however available for this purpose (GitHub) which allow to put embargos and restrictions on the availability of the data.
Please address these and other points raised by the reviewers and myself (see annotated pdf). I look forward to seeing the revised manuscript.Download recommender's annotations