Dfam Families Files =================== Two sets of Dfam families are available for download: o The files named Dfam.* include both curated (DF) and uncurated (DR) families. o The files named Dfam_curatedonly.* include curated (DF) families only. Additionally, they are available in several file formats: o `*.embl` includes EMBL-formatted consensus sequences and metdata. o `*.hmm` includes profile Hidden Markov Models (pHMMs) and metadata for use with the hmmer[1] suite of tools. o `*.h5` are FamDB[2] files, in the HDF5 file format. FamDB includes both consensus sequences and pHMMs, metadata, taxonomy structure and nomenclature, indexes, and other features. !!!! In the initial release the Dfam.h5.gz and Dfam_curatedonly.h5.gz files did not contain all the !!!!! !!!! necessary taxonomy data to facilitate merger with RepbaseRepeatMasker edition. This has been !!!!! !!!! corrected and the original *.h5 files have been moved to the archive directory. The new *.h5 !!!!! !!!! files are named Dfam-p1.h5.gz and Dfam-p1_curatedonly.h5.gz accordingly. !!!!! !!!! Please see the RepeatMasker website for further details on updates to RepeatMasker that are !!!!! !!!! also necessary to use Dfam 3.6 + RepBaseRepeatMasker with RepeatMasker-4.1.2p1. !!!!! A md5sum file ( `*.md5sum` ) is provided for each product for download validation. For more information on the metadata in the EMBL and HMM files, see Dfam's userman.txt [3]. [1]: http://hmmer.org/ [2]: https://github.com/Dfam-consortium/FamDB/ [3]: https://www.dfam.org/releases/Dfam_3.3/userman.txt Using Dfam with RepeatMasker ============================ RepeatMasker ships with a copy of Dfam (curated families only). This can be replaced with a newer version of Dfam, or with the full set of curated and uncurated families. RepeatMasker 4.1.0 and earlier read Dfam in the EMBL or HMM format, depending on the search engine being used. RepeatMasker 4.1.1 and later read Dfam in the FamDB format.