Selectome Quick Guide
Basic research:
You can use the left box to make basic search. Basic search accepts
diverse entries, including
TreeFam family names,
Uniprot protein names or accession numbers,
Ensembl gene names, or keywords.
Advanced research:
You can use the advanced search menu to restrict your search to a specific data type and get results faster.
You can also use the advanced search menu to perform a search on a specific phylogenetic branch. Branch names follow the
NCBI Taxonomy.
If
Positive selection is selected, the query will search only
for subtrees with significant positive selection on the chosen branch.
If not selected, the query will search only for subtrees without
positive selection on that branch.
If
Duplication is selected, only branches annotated as a gene duplication will be queried.
If not selected, only branches annotated as a speciation event will be queried.
Note that the query is done on subtrees, but results return gene
families which may contain several subtrees. Thus a query for the
Homo/Pan/Gorilla branch with and without positive selection may return
some common families, because different subtrees match the query.
Selectome does not perform tests on terminal branches, as power of the
test is lower, while sequencing or gene prediction errors may be more
influential. We urge caution in interpreting results even on non
terminal branches for which there are few sequences, such as the
Homo/Pan/Gorilla branch shown above.
List of results:
After a keyword or an external database id search, Selectome provides the results under two types, by family and by gene:
In both cases, you will have to click on a TF accession number to view the results of positive selection tests.
Results of positive selection tests:
The family datasheet is split into two parts:
1) The description of the family, with accession number, symbol, family
name, description (with access to the paper) and synonyms, if any. This is according to
TreeFam.
and
2) The subtree(s). These are also according to
TreeFam, but with added information on positive selection.
You can
click on gene names to access gene specific
data. Nodes appear as blue boxes for speciation events, and red for
duplication events. Boxes surrounded by a larger green box are branches
under positive selection.
Placing the mouse
above a node provides
information on the type of event (speciation or
duplication), the selection with p-value (if available; small p-values
are more significant), the taxon, the
branch length and the bootstrap (phylogenetic support for the branch,
in %; high bootstrap values are more significant).
Visualization of positive selected sites
Finally, clicking on the
View alignment button will display the protein alignment using the
Jalview Applet. More details for the use of Jalview are available on their site (
Jalview documentation).
Depending on your browser, you should obtain Jalview specific menus in
the browser (not in Firefox 2.* on MacOSX), which notably allow to
change the coloring rules.
Default coloring follows the
ClustalX code. Hydrophobic are in light
blue (AVILWC), basic are in red (KR), acid are in purple (ED), polar
are in green (NSTQ), aromatic polar in ocean (YH), proline in yellow
(P), glycine in orange (G) and fully conserved cysteine in pink (C).
The annotation below the alignment provides the position of the selected
sites, limited to branches for which the positive selection test is significant. Height of bars is proportional to Bayes
Empirical Bayes (BEB) (
Yang, 2005) confidence (higher values are more significant). In addition, bars are color coded:
light gray for confidence between 50% and 95%, dark grey between 95% and 99%, and black above 99%.
Important note: Although the test on a branch may significantly detect
positive selection over a relatively large proportion of sites (e.g.
5%), data will often not be sufficient to detect all these sites with
good confidence. Thus in many cases fewer sites will be reported by BEB
than expected from the likelihood test. And in some cases no sites will
be predicted by BEB although positive selection is significant for a
proportion of sites on the branch.
In the following example, we highlighted the four fishes of
the Percomorpha clade (Tetraodon, Fugu, Stickleback and Medaka).
The position 364 in the Percomorpha branch is predicted to have been
under positive selection with high
confidence, for a switch to hydrophobic residues (V and I) from
Glutamine (Q). The next position 365 is also predicted to have been under
positive selection, for a switch to Histidine (H) from other amino acids
(P and F), but with lower confidence. The BEB value of the latter is
89.5%, as we can see in the bottom of the window when placing the mouse over the BEB
bar. In the same site, there is also a selection on the Theria branch for a Proline
against Phenylalanine and Histidine (F and H).