NSW Govt to open source digital archives software

 

Files to remain in state-owned data centre for hundreds of years.

NSW State Records has promised to open source any digital preservation software it writes for an archive of "born-digital" records that will sit in a new Western Sydney data centre.

The digital archives project is the State Government's first attempt to permanently and centrally store a portion of digital files that are beyond "immediate business use".

"Up until now with these permanent value digital records, government agencies have been obliged to simply maintain them in their own systems," digital archives project manager Cassie Findlay told iTnews.

"There hasn't been anywhere to send [the files] because we haven't had that capacity."

The state will permanently store digital files in much the same way it does physical files, such as photographs, documents and volumes.

It will house the digital archive in a new purpose-built data centre at the Western Sydney Records Centre, which currently acts as a central store for the physical archives.

The data centre will house an IT platform consisting of Cisco servers and switches, VMware virtualisation and an EMC Isilon storage system. It is to be designed and implemented by Logicalis Australia.

NSW State Records will accept a "relatively modest ... 15-20 separate transfers" of digitally-born files from various agencies to test the new preservation system.

"We have goals around getting a certain number of transfers of digital records done within the timeframe of the project to make sure everything's working as it should," Findlay said.

If all goes to plan, the digital archive will begin accepting digital file transfers to state archives generally from mid-2013, when the digital archives project officially expires.

"We want to ... sell the benefits to government of not having that ongoing responsibility to keep those records accessible and usable in-house but rather send them to us, we will preserve and look after them," Findlay said.

"It won't cost the government agency anything anymore to do that, and they'll also be part of this big pool of government information."

Open source software

On the IT platform will sit open source digital preservation software created either by the project's resident programmer or using work from the global "digital preservation scene".

"We're developing our own software based on standards and frameworks that are in place in other libraries and archives," Findlay said.

The department's own code will be made available via github, where it has previously released an experimental API for its catalogue of physical items.

"We haven't put up a lot of stuff to do with digital preservation activities yet [on github]," Findlay said.

Uploading code to github could facilitate interoperability and sharing benefits if the software "becomes widely used".

It could also provide access to a potentially wider base of developers than the project could otherwise afford to engage.

"Like any government department we have limited resources to commit to this ongoing so we want to do it as collaboratively as we can," Findlay said.

"In digital preservation there are a lot of issues that can come up that you maybe didn't think of yourself or that come slightly unexpectedly, so if there are people out there that are also grappling with some of these things and developing similar software or using our tools then that's a help for us."

Non-proprietary architecture

The decision to go down the open source route is partially based on facilitating openness, a key challenge in any archival project.

Findlay said the NSW digital archives project aimed to preserve born-digital files "so they will be readable in 100 or 200 years time".

"This digital archive will have to be around forever so [it has to be] as non-proprietary as possible," she said.

That impacts decisions on the systems in which the files are to be stored, and on the format in which files are stored.

"We can't set up a digital archives system that has too many dependencies on vendors or ongoing licensing costs," Findlay said.

"If you have a format that [is] proprietary and you don't have access to the underlying information to know how it's read and presented, then down the track if the software is less freely available, you've got a problem if [the] company goes out of business and no longer makes the software to read it.

"You [also] can't have a situation where you have digital information that relies on licensing and payment to corporations for it to continue to be read.

"If you're managing a digital archive with terabytes of data in all manner of formats from all different sorts of systems, there's no way you could take it on knowing you've got to somehow pay for it to be accessible."

Findlay said that file transfers made to the project would be evaluated against principles of openness. She said that files may be converted if issues arose.

"There are tools in the digital preservation world to help us to do that," she said.

"For example, the National Archives developed a tool some years back called Xena which converts a range of formats into more open formats. A classic example is Word into ODF."

Searching the archive

Findlay said that the project team is currently working on a metadata management plan for the digital archives that will - among other things - impact the extent to which individuals and organisations are able to search across and harness this new store of government data.

"As well as the more technical preservation metadata that we'll need to manage format issues over time, we have certain metadata that we know we'll want to manage in a fairly structured way because it's to do with the legal aspects of managing the records," Findlay said.

"We're also looking at how we can manage, index and retrieve the metadata and the record contents in a very powerful way so people can analyse across large data sets and pull out - providing they're open access of course - information on a subject basis across a whole range of government departments, which is just impossible to do with older physical sets of records which are in boxes.

"It's exciting to have that possibility".

Public access to stored records would still be negotiated with the agencies involved.

"Some will be open and available online very early, some will be closed for longer periods of time and there are processes for that," Findlay said.

Digitisation complementary

The digital archives project is separate from a long-running digitisation program in NSW.

Where the archives project is about taking in "born-digital" records for the first time, digitisation is about making electronic copies of "older and popular" physical documents held in storage in Sydney's west.

"The digitisation program we have is about us trying to get more of our legacy material in the state archives collection online and available," Findlay said.

"[Digital archives is] about having [agencies] send to us ... born-digital records, so they were never paper, that are required to be kept permanently as archives."

The digital archives project falls under the auspices of a broader NSW Government digital record keeping initiative called Future Proof, which has been running since 2008.

The initiative aimed to foster good record keeping practices in NSW Government agencies "for their own purposes as well as down the track for the State Archives collection".

Copyright © iTnews.com.au . All rights reserved.


NSW Govt to open source digital archives software
 
 
 
 
Top Stories
Review: Microsoft Surface Pro
A year is a long time in the computer hardware business.
 
 
NBN Co could miss revised June fibre targets
Analysis: Cutting it fine in the race to the line.
 
 
Sign up to receive iTnews email bulletins
   FOLLOW US...

Latest VideosSee all videos »

iTnews Academy: Microsoft Windows Server 2012 - Hyper-V
iTnews Academy: Microsoft Windows Server 2012 - Hyper-V
Interview: Australia's 'cloud-last' policy is dangerous.
Interview: Australia's 'cloud-last' policy is dangerous.
Interview: Vivek Kundra on Australia's 'cloud last' policy
Bankwest builds continuous delivery capability
Bankwest builds continuous delivery capability
To automatically deploy test/dev sandboxes by mid-year.
Veterans' Affairs sets sights on modernisation
Veterans' Affairs sets sights on modernisation
Data safe with Human Services, CIO says.
Citi Australia drops platform customisations
Citi Australia drops platform customisations
Technology chief shifts focus from building to leveraging systems.
VicRoads restructures IT team
VicRoads restructures IT team
Department moves to align with industry benchmarks.
Zurich Australia extends IT team offshore
Zurich Australia extends IT team offshore
Malaysian staff served from Australian data centres.
Leigh Berrell - Utilities CIO of the Year
Leigh Berrell - Utilities CIO of the Year
Yarra Valley Water CIO Leigh Berrell accepts his Benchmark Award for Utilities CIO of the Year.
Wayne McMahon - Retail CIO of the Year
Wayne McMahon - Retail CIO of the Year
Domino's Pizza CIO Wayne McMahon accepts his Benchmark Award for Retail CIO of the Year.
Inside Perpetual's ongoing IT transformation
Inside Perpetual's ongoing IT transformation
CIO Jenny Levy discusses how outsourcing will help the firm "simplify, refocus and grow".
Managing Complexity - Defence's Daniel McCabe
Managing Complexity - Defence's Daniel McCabe
Daniel McCabe, Assistant Secretary of Australia's Department of Defence, provides the audience at the iTnews Data Centre Strategy Summit with a deep dive into the organisation's data centre consolidation program.
How Facebook designed the data centre from scratch - Marco Magarelli
How Facebook designed the data centre from scratch - Marco Magarelli
The full keynote by Facebook data centre architect Marco Magarelli at the Australian Data Centre Strategy Summit. Magarelli details the design considerations behind the social network's Prineville, Oregon; North Carolina and Luleå, Sweden data centres.
Modernising Legacy Data Centres - Telstra's Jon Curry
Modernising Legacy Data Centres - Telstra's Jon Curry
Telstra general manager of managed data centres Jon Curry guides the audience at the iTnews Australian Data Centre Summit through the build of the telco's Clayton, Victoria data centre.
NSW Government launches NABERS data centre rating tools
NSW Government launches NABERS data centre rating tools
Matthew Clark from the NSW Department of Environment guides facilties managers through the details of the new NABERS data centre energy rating tool at the Australian Data Centre Strategy Summit.
NABERS launch panel: Australian Data Centre Strategy Summit
NABERS launch panel: Australian Data Centre Strategy Summit
Matthew Clark (NSW Dept of Environment), Greg Boorer (Canberra Data Centres), Glenn Allan (National Australia Bank), Mike Andrea (Strategic Directions) and Bob Sharon (Green Global Consulting) discuss the impact of the NABERS data centre rating.
Judges notes: Fortescue Metals [The Benchmark Awards]
Judges notes: Fortescue Metals [The Benchmark Awards]
iTnews' panel of judges discuss Fortescue Metals 'New World of Work" project, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Retail [The Benchmark Awards]
Judges notes: Retail [The Benchmark Awards]
iTnews' panel of judges discuss the shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: Pacific Aluminium [The Benchmark Awards]
Judges notes: Pacific Aluminium [The Benchmark Awards]
iTnews' panel of judges discuss Pacific Aluminium's lightning fast service desk refresh, one of three shortlisted finalists for the Industrials category of the CIO Benchmark Awards.
Judges notes: Domino's Pizza [The Benchmark Awards]
Judges notes: Domino's Pizza [The Benchmark Awards]
iTnews' panel of judges discuss Domino's Pizza's shift to hosted services, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Judges notes: McDonald's Australia [The Benchmark Awards]
Judges notes: McDonald's Australia [The Benchmark Awards]
iTnews' panel of judges discuss McDonald's Australia's new self-service portal for employees, one of three shortlisted finalists for the Retail category of the CIO Benchmark Awards.
Latest Comments
Polls
Will you quit any cloud services in light of PRISM?

   |   View results
Yes
  58%
 
No
  42%
TOTAL VOTES: 89

Vote