IT engineers from Komatsu and its managed hosting partners Telstra and Fujitsu spent the Easter and Anzac long weekend failing over the entire IT infrastructure of the industrial equipment giant to run at secondary facilities, in an exercise that tested Telstra's cloud credentials.
Komatsu is one of a small number of Australian corporations that has outsourced its entire IT infrastructure to Telstra’s IaaS service, branded ‘Network Computing Services’.
The company’s production x86 workloads (SAP, Microsoft Exchange and SharePoint, SQL, Navision and print and file, among others) run in a virtualised stack in Telstra’s Pitt Street data centre, with disaster recovery and data mirroring to the telco’s Melbourne facility.
Its mainframe workloads (a transactional system for its spares and services business) are co-located in the Sydney facility, with a secondary option at Fujitsu’s North Ryde facility waiting on standby.
Komatsu CIO Ian Harvison told iTnews today that one way to mitigate the risk of pushing all of its infrastructure to a third party was to run live disaster recovery exercises rather than mere simulations.
Harvison “declared” an emergency at 10pm on the Friday night of the Easter and Anzac long weekend.
Komatsu and Telstra's engineers then gave up their break to migrate Komatsu’s entire set of workloads to secondary facilities.
Komatsu set Telstra variable recovery point and recovery time objectives to meet. Applications that supported failover and load balancing would migrate instantaneously; a ‘B’ and ‘C’ group of applications (such as 14 Terabytes in SAP) had to be operational within 24 hours; and a ‘D, E and F’ group of applications (business systems such as email, file and print) would need to be migrated within a “best effort” timeframe.
The results were impressive – the SAP applications were operational in under four hours, the ‘C’ group in under an hour, the ‘D’ group in 45 minutes, ‘E’ in two hours and ‘F’ in four hours.
The only relative laggard was the company’s mainframe systems, which had to be rebuilt from the last tape back-up – a task that took 24 hours to complete.
Harvison said the test was ultimately very successful, with only three seconds of missing data and “no service degradation” beyond staff being unable to remote connect to systems.
The teams then ran the Melbourne (Telstra) and North Ryde (Fujitsu) secondary environments as production systems for two days.
“We have 24 x 7 agreements with customers like Rio Tinto – contracts that ensure we can get parts to their sites at any point in time,” he said.
“We literally ran transactions – part sales – for two days on our DR boxes.”
But Harvison didn’t want to stop there - the company also “failed back” onto its production systems in Sydney within the same weekend.
“We can say with our hands on our hearts that we can failover and run our organisation’s entire infrastructure in failure mode,” he said.
“It was Telstra’s first real test of failing over a customer on its cloud platform."
Read on for a discussion on why Komatsu chose to outsource its entire IT infrastructure...