Page last updated
27 July 2021

Frequently Asked Rhetorical Questions (FARQ) about MUSSELp Cladomics Pages

What is cladomics?

Cladomics is a relatively new feature of the MUSSEL Project Web Site. The cladome is the set of all the clades for a particular focal taxon. For this site, the cladome refers to the set of clades among the available phylogenetic studies relevant to valid (i.e., currently recognized) freshwater mussel genera and family-group level taxa.

What are the data that are the basis for the MUSSELp cladomics pages?

We maintain a database of the branching topologies of phylogenetic trees with freshwater mussel taxa represented among the terminals. (Other taxa may be incidentally represented as well, but the intention at the time of this typing is to capture data on freshwater mussels.) A cladogram (= phylogenetic tree) is recognized by its unique combination of a publication code and a figure number from that publication. A publication may have multiple cladograms.

Each terminal or internal node is assigned a globally unique key (clade_id). The tree topology is simply stored as a list of nodes and the parent node from which each descends. The root of the tree is simply the deepest node, lacking a parent.

Each terminal node is assigned the corresponding unique identifier of the nominal species or nominal genus in the MUSSELp database. Thus, as taxonomy is updated, the taxonomy of the cladogram is updated as well.

What software is used to handle cladomic data?

The data are captured and managed in FileMaker Pro Advanced, like the MUSSELpdb generally. For analysis, the data fields are exported as tab-delimited text, and custom Python 3.x.x scripts are used to make the cladomic reports for each taxon.

What information is presented in the cladomics report associated with a particular taxon?

The objective at this time is to simply list the topologies relevant to the focal genus, tribe, subfamily, etc. in question and the publications that report those topologies. These are sorted according to relevance.

For any genus or family-group level taxon to have a cladome to report, there must be publications that included at least one species classified in that taxon according to the MUSSELpdb. If there is more than one species (or genus) terminal classified in the taxon, then a topology showing the relationships among those terminals is also depicted.


Consider the Lampsilini in Graf & Cummings (2006).

In the cladogram depicted in Fig. 2 of that study, to find the node/clade that represents that tribe, it is simply a matter of 1) finding all the clades composed of terminal taxa classified in the Lampsilini, and 2) finding the clade in that set that has all the lampsiline terminals but the minimum number of total terminals. Graf & Cummings (2006) included 8 lampsiline species and they were recovered as monophyletic — the clade with all eight terminals and only those terminals.

If a taxon is not recovered as monophyletic, the total clade size will be greater than the number of species classified in that clade. For example, Fig. 1 in Campbell & Lydeard (2012) does not depict the Lampsilini as monophyletic because the smallest clade that included five of the lampsilines species they included also contained Plectomerous and two species of Reginaia.

Just do that for all the cladograms and all the genera and family-group level taxa. That is the cladome. If you are interested in the phylogenetic evidence supporting the recognition of a particular taxon, the cladomics pages lists the relevant publications.

By what criteria are the data sorted?

The clades are sorted primarily according to the number of representative terminals that were included in each publication: from most included representative terminals to the fewest.

However, the top publications are those deemed by a so-far pretty primitive algorithm to have provided the most robust tests of monophyly and sister relationships. These are based on multiple criteria to compare cladograms using statistics to describe their ingroup and outgroup sampling. If you are interested in the phylogenetic evidence supporting the recognition of a particular taxon and you only want to look up one or two papers, then the ones at the top are (hopefully) the go-to papers. (All the references are hot-linked to their publication page on this web site, which will link to the publication itself, if the URL is available.)

What statistics are used to compare cladograms?

Five main statistics are applied to distinguish the various studies: 1) number of terminal taxa, 2) number of direct-child terminal taxa, 3) number of outgroup terminals, 4) number of rigorous outgroup terminals, and 5) number of direct-child rigorous outgroups.

Number of terminal taxa. — This is simply the number of species-terminals classified in the taxon in question in a particular cladogram (as described above). For tests of monophyly, analyses with more included species provide a more robust test of monophyly (all else being equal).

Number of direct-child terminal taxa. — “Direct-child” refers to the number taxa at the next lowest rank from the focal taxon. In the case of the tribe Lampsilini, the next lowest rank would be genera. Rather than counting the number of species, this statistic compares the number of direct-child taxa — genera for tribes, tribes for subfamilies, subfamilies for families, etc.

If one study represents a focal tribe with 10 species and another study has only 5 representative species, “number of terminal taxa” would favor the former study as better-sampled. But, if the study with 10 species only samples from two genera whereas the study with 5 species represents more than two genera, then “number of direct-child terminal taxa” would favor the latter study

Number of outgroup terminals. — While the “number of taxa” refers to the ingroup of the focal taxon, the “number of outgroup terminals” is simply the other terminals in the analysis that are not in the ingroup. The more other terminals that are included, the more opportunities for monophyly to be disrupted and to discover accurate sister-group relationships. Analyses with more outgroup terminals rank higher than those with fewer (all else being equal).

Number of rigorous outgroup terminals. — Rigorous outgroup terminals are the terminals that are classified in the same next-highest taxon that contains the focal taxon. For example, a rigorous outgroup terminal for the Lampsilini would be outgroup terminals classified in the Ambleminae, the same subfamily as the ingroup Lampsilini. These are the terminals that provide a meaningful test of the monophyly and sister-group relationships of the Lampsilini.

Number of rigorous direct-child outgroups. —Rigorous direct-child outgroups refer to the taxa at the same rank as the focal taxon that are classified in the next-highest taxon that also contains the focal taxon. Sticking with the Lampsilini as the focal taxon in question, the “number of rigorous direct-child outgroups” would be the number of other tribes in the Ambleminae. The most closely-related tribes in the outgroup that are sampled, the more robust the tests of monophyly and sister-group relationships.

Why do the terminal taxa in the cladograms presented in the cladomics report sometimes look different from what the original authors reported?

Firstly, the taxonomy (synonymy, genera, etc.) may have been updated since the original publication.

Second, since the goal is report relationships among taxa rather than DNA sequences, species represented by more than one terminal are pruned down to represent the species.

Also, it is sometimes the case the analyses are rooted in the original publication in ways that could be improved. Those improvements may be presented on this web site.

What if I am really geeked-up about this and I have more questions?

Send an email to Prof. Daniel Graf in the Department of Biology at the University of Wisconsin-Stevens Point. There is a link to his home page (with contact information) in the footer of this web page.

NSF icon MUSSEL icon
"Making the world a better place, one mollusk at a time."