Researchers rue cost of public cloud data haul

By on
Researchers rue cost of public cloud data haul

Can cloud services address researchers’ computing demands?

With tight budgets and varying computational needs, researchers seem a good match for the scalable, utility-based cloud.

Today's cloud customers can purchase anything from email to supercomputing as a service through vendors like Amazon Web Services, Google and Steam Engine.

But researchers in Australian laboratories are still concerned about data ownership, transfer costs and vendor lock-in.

Guido Aben, e-Research director of Australian research network AARNet, says overseas data centres are an issue for researchers with bandwidth limitations or information to protect.

AARNet currently provides a beta "store and forward" service called CloudStor that uses a recently upgraded 6TB of local storage to transfer files between collaborating research teams.

CloudStor has been designed to overflow into Amazon's S3 Cloud Storage Service if it runs out of local disk space. But this will not be enabled until Amazon establishes a local data centre, Aben says.

"The geography of Australia simply dictates that if you have to ship data out of Australia out of Australia, the economy of cost completely changes," he says.

"Even if I save [money] to store data out of Australia, it costs me [money] to ship data. If cloud compute providers don't set up shop in Australia, it doesn't make sense for us."

Aben says universities are formulating their own strategies about if and how they will use the cloud.

"It's coming to a point where you have to ask yourself as a university, 'Do I have to own my compute power' as much as you ask yourself, 'Do I want a car, or do I want to take taxis all the time'," he says.

"Let's be honest: the marketing is really good, yet experts tell me that most people don't really know what it really is."

Tom Fifield, a high energy physics grid research programmer at the University of Melbourne, favours NIST's definitive characteristics (doc): on-demand self-service; broad network access; resource pooling; rapid elasticity; and a metering capability.

Using grid workload management software called DIRAC, Fifield is using Amazon EC2 to process data from the Belle/Belle II experiment, which investigates why matter is more commonly observed in nature than antimatter.

DIRAC distributes jobs between local resources, Amazon EC2 and the Belle grid. Between January and August, it processed 21TB of data with 27.8 percent of computing power from Amazon, 38.2 percent from collaborators in Japan and 5.82 percent from Melbourne.

"Across physics, a lot of people are looking into it [infrastructure-as-a-service]," Fifield says. "Sometimes, you find that you just need more capacity ... we have several large peak computing demands."

But while processing is cheaper on Amazon's cloud, transferring data to EC2 - which is outside the 10Gbps AARNet and its partner networks - is not so. Data is routed to and from Melbourne via Japan, from where it is cheaper to transfer data from Amazon's cloud.

Another concern is vendor lock-in. Belle/Belle II researchers believe Amazon has the best pricing, community, and features for their project at the moment, but this may not always be the case.

Fifield says the researchers are "embracing the cloud model", preparing themselves for changes by storing all data on their own grid.

And with DIRAC set up, Fifield expects to need only a week to move to, or add, another cloud provider.

Commercial cloud services are less appealing to University of Sydney physiotherapy researcher Marlene Fransen, who uses AARNet's CloudStor to collaborate with researchers in Austria.

While CloudStor lets Fransen electronically transfer data that previously had to be shipped in minutes, she considers infrastructure-as-a-service "not applicable" to her work due to the sensitivity of patient information and use of a specific software program.

"I think some unis must now be starting to realise that the honeymoon period for cloud is over and it's time to draw your own strategy as to what you're going to do," AARNet's Aben says.

"There is the promise of enormous compute cycles, but there is a risk of getting locked into a particular compute paradigm by setting up your experiments to use their platform exclusively.

"We're really keen to keep an eye on it and see what happens," he says.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © . All rights reserved.

Most Read Articles

Log In

Username / Email:
  |  Forgot your password?