Cluster Overview of Archaeal Life
- New to COAL? Please check the Help section!
- Can't see the applet? Make sure applets are enabled in your browser

Clusters must include the following organisms (select)
Note that selecting many organisms may cause the search to be excessively slow
Clusters must fulfill the following consistency scores
Clusters must include the following properties
Clusters must include the following metadata
COAL is an acronym for Cluster Overview of Archaeal Life. The purpose of COAL is to visualize protein orthology and to relate orthology with additional information derived from their genomes. This information includes phylogeny, ecotype, metabolism, thermal preference and aerobicity. The protein orthology networks are also subclustered, when possible, using a bipartioning approach based on spectral clustering. The advantage of subclustering is clear considering the variable plasticity of protein sequences. For instance, some protein families are quite flexible, e.g. ABC transporters, while other families are very tightly conserved, e.g. ribosomal subunits. Given the heterogeny of biological data, it is clear that hard clustering cutoffs will result in clusters which are unlikely to be biologically relevant for all classes of proteins. This soft approach allows the user to stop clustering at a point that makes biological sense.
Back to topCOAL clusters are of three types; root, stem and leaf. The root clusters are at the top level of the cluster hierarchy, and are identified by their cluster numbering format, which are of the form 41, 8 etc. Note that these numbers are integers. Any number of the form 41.1 or 1144.0.1 denote a subcluster. In this case, 41.1 is one of the two subclusters of root cluster 41. If these clusters in turn have additional subclusters, they are stem clusters. If they do not have any subclusters, they are leaf clusters.
Subclustering of root clusters is performed using spectral clustering (see refs below). We attempt to subdivide each cluster into two subclusters at a time. The separation is successful if the second eigenvalue of the Markov transition matrix exceeds a threshold. Note that this a threshold set on the normalized transitions and not on protein orthology itself. It is therefore more dependent on the topology of the network than on the actual orthologies themselves.
To select a cluster and load it into the applet, enter a cluster number into the Cluster box on the Main page and click Update. The cluster appears as a network of nodes (proteins) connected by edges representing the orthology. Various information about the cluster will be loaded below the applet, along with information of the individual proteins. Initially, networks will not be colored. To get more information about proteins, you can either shift-click a node in the applet or follow the Gene OID links in the list below the applet. You will be taken to the IMG entry for that protein.
Back to topProteins can be highlighted according to the phylogenetic placement of their genomes. To color nodes in the applet and proteins in the list, select the level of phylogeny from the drop down list on the Main page and click Phylogeny. Nodes are colored, and the list is sorted and set to display the Phylum, Class and Species levels of taxonomy. You can view other metadata by clicking one of the buttons above the list while maintaining the phylogenetic ordering and coloring.
Note that if your selection returns a large number of categories, the coloring will fail.
Back to topCurrently, there are four categories of metadata in COAL, oxygen usage (e.g. aerobe, anaerobe), Metabolism (e.g. Chemoorganoheterotroph, Chemolithoautotroph), thermal preference (e.g. hyperthermophile, mesophile) and ecotype (e.g. marine, aquatic). This data was taken from the GOLD database, and more detailed information can be found there. Finally, COG, PFAM and arCOG annotations can be used to color nodes and proteins.
Nodes and proteins can be colored analogously to the previous section on phylogeny.
Back to topAll genes that are included in a cluster can be exported to the IMG gene cart by clicking the IMG button in the left column. They can then be analyzed using IMG as normal. You can also go to the gene page directly by either clicking the link in the members table or shift-clicking a node in the graph.
Medusa can be found at SourceForge.
Back to topIs the length of an edge any indicator of the strength of similarity between proteins?
No, the edge length depends on the layout only. There is no way to correctly show fixed edge lengths, since we are reducing a multidimensional object to two dimensions. However, the relative strength of orthology can be visualized as the opacity of the edge. Weak similarities are shown as more translucent, and strong similarities as more bold.
Got a question? Please contact shooper /at/ lbl.gov
Back to top