File format fail behind England's COVID case underreporting

By on
File format fail behind England's COVID case underreporting
Source: Twitter/R Widerstrom

Health agency developers used old Excel .XLS files.

Developers picking a more than 30-year-old file format to store COVID test result data was behind the underreporting of almost 16,000 cases in England, it has emerged.

The missed cases were not discovered until Friday last week.

Public Health England (PHE) collected data from commercial firms that analysed the swab results to see who tested positive or not, and receved the information in files with the values separated by commas.

Using comma separated values (CSV) in files is common practice for handling data.

PHE loaded the files into Microsoft Excel spreadsheet templates to be entered into a central system used for government contact tracing and reporting dashboards.

However, the PHE developers used the original binary .XLS file format for Excel which first appeared in 1987.

Unlike the Open Office extended markup language based .XLSX format that was introduced in 2007, .XLS can only store 65,536 rows and 256 columns.

By comparison, the newer .XLSX and .XLSM formats can handle 1,048,576 rows and 16,384 columns.

BBC reported that thanks to the old file format being used, each Excel template was limited to recording around 1400 cases. 

Any further cases than that were ignored by template, and led to 15,841 going unreported between September 25 and October 2 which may have led to people being unaware of COVID-19 exposure in that time.

A temporary workaround that splits files into smaller ones has now been put into place to avoid hitting the .XLS limits and cases being dropped.

As of writing, it is not clear if PHE intends to continue using Excel or to move to a different data processing system.

Got a news tip for our journalists? Share it with us anonymously here.
Copyright © . All rights reserved.

Most Read Articles

Log In

  |  Forgot your password?