Perth’s Curtin University has nearly completed a six-month trial of Microsoft’s Windows Azure platform-as-a-service to speed up DNA genome sequencing of extinct animal species.
Through the trial, the institution hoped to determine the viability of transmitting data in a hybrid cloud environment between assigned Azure nodes hosted in Singapore, internally hosted high-performance computing clusters and a genome sequencing machine at the Royal Perth Hospital.
Curtin has readily adopted cloud services for student email, compute and storage and earlier this year began transitioning to Microsoft’s Office 365 productivity suite for staff.
Systems engineer Amandeep Sidhu said he was given free rein to trial any number of cloud services for the sequencing initiative.
"While we have been doing a lot of compute storage as researchers on the national compute facilities - iVEC [in Western Australia], NCI [in Canberra] and all - it would be interesting to see if I can port what I have been doing to cloud,” he said.
Biomedical researchers sourced DNA from a chosen animal, often extinct, and sent it to a genome sequencer that had been recently installed at the hospital.
The resulting text file ranged between one and 100 gigabytes in size and was compared with the sequence of the same or neighbouring animal species to ensure its accuracy, a process that often took weeks on an internal or national compute clusters.
Since March, Curtin has used Azure to study genome sequences of four animals.
It expected to add a further three before the end of October, including an extinct breed of Peruvian alpaca. Each analysis typically took several hours to a day on a single or multiple Azure nodes.
“The hardest part is when you do not have a reference, you have no idea where you’re going,” Sidhu said, referring to the case of the alpaca.
"We don’t have anything to go from, the closest we can come is the descendants of the species, its closest possible neighbour in the chain.”
A complex human genome sequence - which initially took millions of dollars and 13 years to complete - could now be analysed in less than a week in the cloud.
"Where Azure and HPC works pretty well is the on-demand compute and storage of [data]," Sidhu said. "You get the flexibility of scaling up an Azure node pretty easily and vary the size of the compute and storage that you need."
Curtin chose to use Azure over rival Amazon Web Services due to performance and the relative ease of transition for researchers from a technical standpoint.
After completing the Azure trial later this year, the university would attempt to form a public-private hybrid. Successfully communicating between the internal Cisco vBlock clusters and Azure would likely form the biggest obstacle, Sidhu said.
"I’m hopeful that next year when we do have sequencing facilities in-house, I’m able to automate to an extent where the data goes from the sequencer facility to the Windows HPC.
"[The automation system will] basically determine ... what workflow it should use and automatically provisions nodes in the cluster, and doesn’t discriminate what’s a local compute and what’s an Azure compute.”
Microsoft’s research arm has since also established a specific program dubbed NCBI BLAST that allowed researchers to analyse their data against a freely accessible global database.
Research middle men
the cloud computing trial, Sidhu was also tasked with communicating the technology requirements of researchers to IT services at the institution.
Since March, he had held the dual roles of system engineer and adjunct research fellow for the university’s biomedical sciences faculty, effectively becoming a middle-man between departments for IT-heavy initiatives such as the DNA sequencing trial.
Similar roles had since been introduced for each of the university’s faculties, including engineering, humanities and science.
According to Curtin chief information officer, Peter Nikoletatos, the new roles were part of the university's deliberate move to drop the “e” for “eResearch” and make IT’s role in projects more commonplace.
“At an operational level, seconding [Sidhu] into our team provided a real researcher’s mindset to help build our capability,” he said.
“We have invested in Microsoft HPC and Azure to achieve this ‘service’ and more importantly the link between what he does and his research peers gets a significant boost.”
James Hutchinson travelled to TechEd 2011 on the Gold Coast as a guest of Microsoft.