Overlooked protein homologies from twilight zone Smith-Waterman analysis
This site presents protein sequence relationships that appear to have been missed by standard annotation pipelines. They were identified through an all-on-all Smith-Waterman comparison of UniProt Swiss-Prot (~570,000 sequences), filtered to remove known relationships. The tree-building method produces not an evolutionary tree, but rather picks out homologies that connect clusters. These weaker links are normally drowned out by the very strong connections within the clusters.
The focus is on "twilight zone" similarities (15–30% identity, scores 140–300) where genuine homology is often obscured by sequence divergence or compositional bias. Standard tools tend to dismiss these as noise.
Browse the 6,579 candidate pairs →
6,579 candidate protein pairs
after automated filtering, of which
18 proposed annotation updates
have been distilled - cases where sequence homologies suggest UniProt updates.
Highlights include:
Some of the stronger 6,579 candidates may be of interest too, though many of them represent similarities that are known. They didn't get filtered by the simple filters. I didn't, for example, look for alternative products of the same gene.
The region annotated as "transmembrane" in P85828 (positions 90–112) aligns perfectly with the signal peptide of the ant ortholog - suggesting an annotation update, and that P85828 is secreted, not membrane-bound.