The cost of sequencing a human genome has dropped from $3 billion to about $1000, and the time it takes has shrunk from decades to days, completely revolutionising the work that cancer researchers like Professor Dominik Beck do on a daily basis.

Beck, a research fellow and senior lecturer at the UTS Centre for Health Technologies and the Lowy Cancer Research Centre at UNSW, told iTnews where once "we required specialised sequencing labs we can now [obtain DNA samples] with the power of a desktop computer".
His current projects are focused on better understanding the biology of acute myeloid leukaemia (AML) and improving the life expectancy of patients with blood-related cancers.
Beck's work focuses on identifying treatments that can target and destroy non-functional leukemic cells while sparing the precious few normal healthy cells sufferers of the disease are producing.
“Therefore, one of our goals is to search for features of leukemic cells that are not mirrored in normal cells. We use high-throughput biotechnologies such as next-generation sequencing to generate large amounts of molecular data from leukemic and normal cells," he said.
Beck's own circumstances mirror the way that cost-effective supercomputing is changing what is possible in the field of medical research.
“For example, there are now [genome sampling] machines from Oxford Nanopore that you can plug into your computer by USB, and it’s around the size of a USB stick," he said.
The kinds of genome comparisons Beck and his counterparts are crunching expand into the billions of lines of data, and still take several weeks using all the HPC power available to the researchers.
The data is crunched using a combination of both on-premise HPC clusters, as well as through the NSW supercomputing consortium Intersect.
“For example, we use genome sequencing to determine the string of three billion letters that make up a patient’s cancer DNA and then ask the question which of these letters is present or absent when compared to normal DNA,” Beck said.
“In principle we would like to do this analysis on multiple patient samples. For example, for 1000 patients we have to analyse two sets of 1000 strings of 3 billion letters."
Recent experiments have seen Beck's team test several thousand gene combinations at once to try and pick out just the kinds of characterstics that could differentiate between normal and sick cells.
“The computational power available to us via the HPCs [allowed] us to test every single possible gene combination [in a sample] and this [has] helped us to identify a novel prognostic biomarker which we aim to further develop in the future," said Beck.
Cancer reseachers are also making the most of changes to US law which require data generated from grant-funded research to be made freely available. This means a number of large and well annotated DNA databases are acessible online, such as the NCBI GEO.
“It typically takes only a few hours to download more patient samples of a specific disease than are available to a cancer researchers at anyone of the world’s best research institutes," Beck explained.
“We started by accessing the expression of around 25,000 genes in around 750 patients from a freely available online repository called NCBI GEO, and we used the INTERSECT cluster to identify a smaller set of genes,” Beck said.
“We then collected around 200 patient samples from Australia, measured the smaller gene set identified and tested the predictions made from the public data on the data we collected ourselves.”