The John Curtin School of Medical Research at the Australian National University is turning to the cloud to drive the data crunching underpinning its genomic research on cancer and mutations.

Although the researchers have access to on-prem high performance compute power - and slightly off-prem at the National Computational Infrastructure (NCI) across the road - their work often only required short, intense bursts of analytics that otherwise leaves the HPC capabilities sitting idle.
Research fellow in the Department of Genome Science Dr Sebastian Kurscheid told iTnews that a flexible solution for both storing and analysing and sharing the terabytes of DNA data needed to test hypotheses and come up with new ones.
“One thing we are trying to figure out at the moment is how in a biological sense disturbing a biological system by introducing a mutation, for example, changes the transcriptional activity within cells, by which we mean which genes are expressed and in which way they are expressed,” Kurscheid told iTnews.
“Because cells basically have a way to put some parts of a gene together in a different order, and that essentially changes the function of that gene.
“So what we are exploring at the moment is how that changes between two conditions and if we introduce a mutation in certain cells can we then see changes that are specifically caused by this mutation. And that requires quite a lot of computational analysis to get to a result there.”
But one of the bottlenecks he and his team have faced was in finding more interactive ways of managing and analysing that data.
Booking time on HPC facilities at the university or NCI was one of the biggest challenges, with the genome science department competing with other researchers.
Finding the right resources for visualising the data, when one human genome alone can reach 100 GB, was also a challenge.
Since June the school has been moving various workloads and pipelines into Azure to do the test how cloud environments compare to its legacy systems.
“I think that what’s really come out of this is that it’s an additional tool in our toolbox that we can now utilise more frequently and that fills in the gaps in that environment we have and that looks like a hybrid approach to data analytics that we’re now pursuing,” Kurscheid said.
As the cloud integration continues, Kurscheid also plans to take advantage of the machine learning and artificial intelligence capabilities available through Azure.
The added insights and capability to visualise terabytes worth of data would be invaluable as the field of medicine increasingly looks to personalised treatments for diseases based on a patient’s unique DNA profile.
Traditionally personalised medicine has centred around rare conditions, however, Kurscheid said medical genomics is increasingly relevant to clinical practice and in understanding more common diseases.
He hopes the field will be more attractive to researchers and institutions if they can similarly offload the expensive and time-consuming IT burden of research into the cloud.
At ANU, he’s also seen that cloud can reduce the amount of administration involved in establishing workflows - an attractive proposition given Kurscheid estimates a third of his 3.5-year research program has been spent dealing with technical overheads.
Before adopting Azure, two weeks were spent building a blueprint for data entry and analysis that could be applied to future workloads with minimal modifications.
Kurscheid said he could have saved nine months if the research had been conducted in the one environment.
A single cloud environment is similarly beneficial for international collaborations, with Kurscheid now able to easily share data, analytical workflows, and outcomes with colleagues in Norway and the US. Smaller institutions could also tap into the growing resource without having to set up their own HPC facilities.
“Part of the long-term vision is that in the medical field genomics becomes more widely available – it’s already important in rare diseases, Kurscheid said.
“As it becomes more common, smaller hospitals or pathology services might see demand for this. I think that making these workflows and tools and analysis pipelines publicly available in a manner that is adaptable for others would support the broader uptake of genomics in the medical field.”