KAT (The K-mer Analysis Toolkit)

KAT is a suite of tools that generate, analyse and compare k-mer spectra produced from sequence files.

KAT provides a suite of tools that, through the use of k-mer counts, help the user address or identify issues such as:

  • Determining sequencing completeness for assembly
  • Assessing sequencing bias
  • Identifying contaminants
  • Validating genomic assemblies and filtering content

KAT is geared primarily to work with high-coverage genomic reads from Illumina devices, although it can work with any fasta or fastq sequence file.

At its core, KAT exploits the concept of k-mer spectra (histograms plotting number of distinct k-mers at each frequency). By studying properties of the k-mer spectra it’s possible to discover important information about the data quality (level of errors, sequencing biases, completeness of sequencing coverage and potential contamination) and genomic complexity (size, karyotype, levels of heterozygosity and repeat content). Further information can be gleaned through pairwise comparison of spectra, making KAT useful for WGS library comparisons and assembly validation.

K-mer counting is a critical element for all KAT tools and is accomplished through an integrated and modified version of Jellyfish2's counting method http://www.genome.umd.edu/jellyfish.html Jellyfish was selected for this task because it supports large K values and is one of the fastest k-mer counting programmes currently available.

GitHub URL

KAT

Documentation link

https://kat.readthedocs.org/en/latest/