DNA Transposon Termini Signatures (version 1.1)

The termini of most Class II DNA transposons, in particular those with terminal inverted repeat sequences (TIRs), are critical to the process of transposition and often show signs of conservation between families within a single class. These conserved patterns may be used to classify de-novo identified elements. Non-coding DNA transposons often can only be classified by recognition of these patterns along with not-always-available target site duplication characteristics. As many transposons have (fragments) of other elements embedded, even those with apparent coding sequence should generally be classified by their terminal patterns if there is a conflict.

Figure 1: Major subgroups of Class II DNA transposons

DNA transposons found in eukaryotic genomes fall into four broad groups1,2,3 (Figure 1) each with their own set of autonomous and non-autonomous forms (typically non-coding deletion products retaining the end recognition sites). Here we provide logos and HMMs for the terminal 26 bp of most currently recognized subclasses. The strongest signals are carried by TIR DNA transposons, but even Helitrons and some Crypton elements lacking TIRs contain signs of base conservation. HMMs for the 5’ and 3’ ends were generated from an ungapped alignment of the first/last 60bp of all known members with clearly defined termini (Figure 2). Minor modifications were made to some consensus sequences to have each line up from the true end, most commonly involving removal of target site duplications. The 3’ ends were reverse-complemented to match the 5’ end (in case of TIRs). HMMs were developed using HMMER (hmmbuild) for the 5’ ends, the 3’ ends, and the combined termini. The logos below are a representation of the information content at each position within the HMM. The full set may be downloaded from https://www.dfam.org/releases/dna_termini_1.1 and will be updated as new families are added to each class.

Figure 2: Generation of 5’/3’ and combined (termini) HMMs from a single subclass of DNA transposons consensus sequences.
  1. Piégu, Benoît, et al. "A survey of transposable element classification systems–a call for a fundamental update to meet the challenge of their diversity and complexity." Molecular phylogenetics and evolution 86 (2015): 90-109.
  2. Feschotte, Cédric, and Ellen J. Pritham. "DNA transposons and the evolution of eukaryotic genomes." Annu. Rev. Genet. 41 (2007): 331-368.
  3. Wicker, Thomas, et al. "A unified classification system for eukaryotic transposable elements." Nature Reviews Genetics 8.12 (2007): 973-982.
SubclassFamiliesTermini5' end3' endNotes
Circular dsDNA Intermediate
Crypton_A60
Crypton_C5
Crypton_F22
Crypton_H7
Crypton_I9
Crypton_R6
Crypton_S59
Crypton_V40
DNA Polymerase
Maverick112
Rolling Circle
Helitron930
Terminal Inverted Repeat
Academ_185
Academ_23
Academ_H18 First C is likely part of the target site that favors CC<cut>.
CMC_Chapaev46
CMC_Chapaev_348
CMC_EnSpm609
CMC_Mirage5 Could be multiple groups here. Too few families to tell yet.
CMC_Transib118
Dada29
Ginger60
hAT combined3084
hAT_Ac823
hAT_Blackjack130
hAT_Charlie729
hAT_Pegasus7
hAT_Restless4
hAT_Tag1158
hAT_Tip100705
hAT_hAT16
hAT_hAT533
hAT_hAT68
hAT_hAT1981
hAT_hATm140
hAT_hATw15
hAT_hATx26
hAT_hobo40
IS3EU24
Kolobok_E8
Kolobok_H6
Kolobok_Hydra70
Kolobok_T2153
MULE_F12
MULE_MuDR1485
MULE_NOF25
Merlin79
Novosib8
P188
P_Fungi16
PIF_Harbinger1093
PIF_HarbS21 Consensus probably starts with GGGC like other Harbingers; the T could be part of a TSD.
PIF_ISL2EU44 Low signal in our dataset. Han et al. (PMID26120370) has an alternative LOGO.
PIF_Spy56
PiggyBac310 After removal of TTAA TSDs for many families. There appear to be subgroups with CACGTT…, CACTA… and CTAGTGTCTA termini.
PiggyBac_A4
PiggyBac_X43
Sola_1119
Sola_289
Sola_328
TcMar combined2624
TcMar_Ant120 Appears to be two groups present (TA TSDs removed) in this subclass. One with a CAGCTATT motif and another with TTGTGTTT.
TcMar_Cweed26
TcMar_Fot1220
TcMar_Gizmo3
TcMar_ISRm11109
TcMar_m4445
TcMar_Mariner457
TcMar_Mogwai3
TcMar_Pogo120
TcMar_Sagan34
TcMar_Stowaway152 The majority (102) start with CTCCCTC.., but the TA may be part of the TIR as many TA...TA consensi are annotated to have TA TSDs.
TcMar_Tc1924
TcMar_Tc2108
TcMar_Tc458
TcMar_Tigger310
Zator51
Zisupton22