SeqQuests Findings

Overlooked protein homologies from twilight zone Smith-Waterman analysis

This site presents protein sequence relationships that appear to have been missed by standard annotation pipelines. They were identified through an all-on-all Smith-Waterman comparison of UniProt Swiss-Prot (~570,000 sequences), filtered to remove known relationships. The tree-building method produces not an evolutionary tree, but rather picks out homologies that connect clusters. These weaker links are normally drowned out by the very strong connections within the clusters.

The focus is on "twilight zone" similarities (15–30% identity, scores 140–300) where genuine homology is often obscured by sequence divergence or compositional bias. Standard tools tend to dismiss these as noise.

Browse the 6,579 candidate pairs →

What's here:

6,579 candidate protein pairs after automated filtering, of which
18 proposed annotation updates have been distilled - cases where sequence homologies suggest UniProt updates.

Read the proposed updates →

Highlights include:

Some of the stronger 6,579 candidates may be of interest too, though many of them represent similarities that are known. They didn't get filtered by the simple filters. I didn't, for example, look for alternative products of the same gene.

Sample alignment

P85828-E2ADG2 s(654) Length: 314/205 P85828: Prohormone-3; Apis mellifera (Honeybee). E2ADG2: ITG-like peptide {ECO:0000303|PubMed:25641051}; Camponotus floridanus (Florida carpenter ant). 89 MYTCVALTVVALVSTMHFGVEAWGGLFNRFSPEMLSNLGYGSHGDHISKSGLYQRPLSTSYGYSYDSLEE |....|.|.|....|...|||||||||||||||||||||||.||......||.|......||......|| 1 MRVYAAITLVLVANTAYIGVEAWGGLFNRFSPEMLSNLGYGGHGSYMNRPGLLQEGYDGIYGEGAEPTEE 159 VIPCYERKCTLNEHCCPGSICMNVDGDVGHCVFELGQKQGELCRNDNDCETGLMCAEVAGSETRSCQVPI |||||||..|.||||||||||..|..|.||...|..||||||.|.||||||||||..| . 71 --PCYERKCMYNDHCCPGSICMNFNGVTGTCVSDFGMTQGELCRRDSDCETGLMCAEMSG------H--- 229 TSNKLYNEECNVSGECDISRGLCCQLQRRHRQTPRKVCSYFKDPLVCIGPVATDQIKSIVQYTSGEKRIT |||..|.||||||||||||||||||.|||||||||||||||||||||||||..|||||||||| 130 -------EECAMSSECDISRGLCCQLQRRHRQAPRKVCSYFKDPLVCIGPVATDQIKSVIQYTSGEKRIT 299 GQGNRIFKR |||||.||| 193 GQGNRLFKR

The region annotated as "transmembrane" in P85828 (positions 90–112) aligns perfectly with the signal peptide of the ant ortholog - suggesting an annotation update, and that P85828 is secreted, not membrane-bound.