What are UMIs and why do I need them?

Ultra-deep sequencing is a sensitive method to allow the identification of very low frequency variants. However, one caveat to this technique is the high level of duplicates generated when sequencing the same DNA fragment multiple times.

A UMI (Unique Molecular Identifier) is a molecular tag consisting of a short DNA sequence that is used to identify and quantify unique DNA molecules (Fig.1). These 9bp tags, ligated to the end of DNA fragments during library preparation, enable duplicate removal. DNA molecules with identical UMIs are assumed to originate from the same initial input molecule. Sequencing reads with the same UMI are grouped together to form consensus reads. UMIs therefore, allow for PCR sequencing error correction and ultra-low frequency mutation calling.



Fig.1 Diagram describes two approaches in dealing with duplicates in ultra-deep sequencing. In the top middle and right boxes duplicate reads are highlighted by the red lines at each end of the read. In the middle and right lower boxes, duplicate reads are red and unique reads are grey. Each UMI is indicated as the coloured block at the start of each read, with the UMI approach correctly discriminating the true duplicates. In this way, the UMI approach allows for ultra-sensitive variant detection in very deep sequencing applications. (F. Bewicke-Copley et al./Computational and Structural Biotechnology Journal 17 (2019) 1348–1359)