The Garvan Institute of Medical Research has partnered with the University of NSW to take genome analysis ‘offline’ by adapting the algorithms that perform DNA analysis to require far less compute than current tools.
Medical practitioners fighting the Ebola and Zika viruses in New Guinea and Brazil have already used small genome sequencing devices that can clip on to a smartphone, but these devices still require high-performance computer workstations or reliable internet connections to identify genes.
Devices like the Oxford Nanopore Technologies MinION can create over a terabyte of data in 48 hours, but their use still isn’t commonplace because comparing or ‘aligning’ DNA from an unknown sample to a reference database to figure out what the sample is requires around 16 GB of RAM, which is beyond the capabilities of most mid-range laptops and flagship smartphones.
For cash-strapped medical programs in developing countries or during large-scale outbreaks, that kind of processing power isn’t easy to come by at scale, and a reliable internet connection can be just as hard to find.
In a new paper released in Nature, Garvan’s Genomic Technologies lead Dr Martin Smith and his team detailed the computational method for reducing the amount of memory needed for aligning sequences from 11GB to 2GB - well within the reach of mid-range smartphones.
The researchers adapted the Minimap2 program, which aligns DNA sequencing ‘reads’ to a reference library of known genomes.
This reference library is usually indexed, which helps to map sequencing reads to their corresponding positions in a genome.
“The challenge, so far, has been that the reference index requires too much computer memory,” Smith said.
“We took the approach of splitting the reference library up into smaller segments, against which we mapped the DNA reads. Once we finished mapping to the smaller segments, we pool results together and tease out the noise, much like creating a panorama by stitching together smaller photos.
“Other algorithms, which take a similar approach of splitting up the reference data, produce a lot of spurious and duplicate mappings – just like overlapping photos in the panorama.
“What we did in this study was fine-tune parameters and select the best mappings across several small indexes.”
This approach had similar accuracy to standard processes which are currently used but require much more powerful compute.
Garvan’s new tool was able to successfully replicate 99.98% of alignments using the smaller index segments.
“The potential of lightweight, portable genomic analysis is vast – we hope that this technology will one day be applied in the context of point-of-care microbial infections in remote regions, or in doctors’ hands at the hospital bedside,” Smith said.