Selectome: a Database of Positive Selection
  Selectome © 2008/2009   


Selectome Quick Guide

Basic research:

You can use the left box to make basic search. Basic search accepts diverse entries, including TreeFam family names, Uniprot protein names or accession numbers, Ensembl gene names, or keywords.

basic_search_TF   basic_search_sp   basic_search_kinase


Advanced research:

You can use the advanced search menu to restrict your search to a specific data type and get results faster.

advanced_search_data

You can also use the advanced search menu to perform a search on a specific phylogenetic branch. Branch names follow the NCBI Taxonomy.

If Positive selection is selected, the query will search only for subtrees with significant positive selection on the chosen branch. If not selected, the query will search only for subtrees without positive selection on that branch.

If Duplication is selected, only branches annotated as a gene duplication will be queried. If not selected, only branches annotated as a speciation event will be queried.

Note that the query is done on subtrees, but results return gene families which may contain several subtrees. Thus a query for the Homo/Pan/Gorilla branch with and without positive selection may return some common families, because different subtrees match the query.

advanced_search_branch

Selectome does not perform tests on terminal branches, as power of the test is lower, while sequencing or gene prediction errors may be more influential. We urge caution in interpreting results even on non terminal branches for which there are few sequences, such as the Homo/Pan/Gorilla branch shown above.


List of results:

After a keyword or an external database id search, Selectome provides the results under two types, by family and by gene:

results_gene

results_fam

In both cases, you will have to click on a TF accession number to view the results of positive selection tests.


Results of positive selection tests:

The family datasheet is split into two parts:

1) The description of the family, with accession number, symbol, family name, description (with access to the paper) and synonyms, if any. This is according to TreeFam.

tree_desc

and

2) The subtree(s). These are also according to TreeFam, but with added information on positive selection.
You can click on gene names to access gene specific data. Nodes appear as blue boxes for speciation events, and red for duplication events. Boxes surrounded by a larger green box are branches under positive selection.
Placing the mouse above a node provides information on the type of event (speciation or duplication), the selection with p-value (if available; small p-values are more significant), the taxon, the branch length and the bootstrap (phylogenetic support for the branch, in %; high bootstrap values are more significant).

tree_pvalue


Visualization of positive selected sites

Finally, clicking on the View alignment button will display the protein alignment using the Jalview Applet. More details for the use of Jalview are available on their site (Jalview documentation). Depending on your browser, you should obtain Jalview specific menus in the browser (not in Firefox 2.* on MacOSX), which notably allow to change the coloring rules.

Default coloring follows the ClustalX code. Hydrophobic are in light blue (AVILWC), basic are in red (KR), acid are in purple (ED), polar are in green (NSTQ), aromatic polar in ocean (YH), proline in yellow (P), glycine in orange (G) and fully conserved cysteine in pink (C).
The annotation below the alignment provides the position of the selected sites, limited to branches for which the positive selection test is significant. Height of bars is proportional to Bayes Empirical Bayes (BEB) (Yang, 2005) confidence (higher values are more significant). In addition, bars are color coded: light gray for confidence between 50% and 95%, dark grey between 95% and 99%, and black above 99%.

Important note: Although the test on a branch may significantly detect positive selection over a relatively large proportion of sites (e.g. 5%), data will often not be sufficient to detect all these sites with good confidence. Thus in many cases fewer sites will be reported by BEB than expected from the likelihood test. And in some cases no sites will be predicted by BEB although positive selection is significant for a proportion of sites on the branch.

In the following example, we highlighted the four fishes of the Percomorpha clade (Tetraodon, Fugu, Stickleback and Medaka). The position 364 in the Percomorpha branch is predicted to have been under positive selection with high confidence, for a switch to hydrophobic residues (V and I) from Glutamine (Q). The next position 365 is also predicted to have been under positive selection, for a switch to Histidine (H) from other amino acids (P and F), but with lower confidence. The BEB value of the latter is 89.5%, as we can see in the bottom of the window when placing the mouse over the BEB bar. In the same site, there is also a selection on the Theria branch for a Proline against Phenylalanine and Histidine (F and H).

pos_sel