The recent analysis of Microsoft Word, PowerPoint and Excel files, available on the websites of some of the world's biggest firms, resulted in the identification of "thousands of user IDs and email addresses, comments and track changes, hundreds of PowerPoint files containing obsolete text and speaker notes, and thousands of files that contain network paths".
The study, conducted for Bitform, a software component developer of tools for content inspection and security, analyzed 8,038 files for more than two dozen specific types of metadata and hidden information that have the potential to expose proprietary or confidential information, breach corporate policies and open security holes.
It warned there are "thousands of instances" of information exposure likely not intended to be made public. One such instance was a white paper from a computer manufacturer containing comments intended for internal review only that acknowledged scalability limitations of a partner product.
Another publicly accessible document found on a major firm's website was a press kit from an auto maker announcing the arrival of a new model. It contained more than 500 user names identified as contributors under author information.
Additionally the study discovered a contract from a telecommunications company containing dozens of track changes - both insertions and deletions - potentially exposing negotiable terms. It also examined a customer presentation from an equipment manufacturer containing a comment that questioned whether the facts in a slide were accurate, and the name of a prior presenter deleted from the first slide but viewable as fast save data.
"As content-related security in general and the inside-out security threat in particular continue to gain focus, it's remarkable how much information is accidentally exposed through seemingly benign documents that organizations generate every day," said Joe Keslin, chief executive officer and co-founder of Bitform.
"You don't have to expose your trade secrets to open your organization to potential harm. For instance, what we call Outlook Properties is a great example of information that you probably don't want to expose to the world. This includes a user's email display name, the subject line of the email that contained the file attachment and the sender's email address. As an executive, I don't want my employee's display names and email addresses made available to competitors, recruiters, social engineers, hackers or anyone else that we don't explicitly want to share this information with."
Keslin added that it was ironic that companies spent significant dollars on systems that protect against spam, phishing and intrusion, yet provide fodder for these very threats by sharing proprietary information represented by the metadata and hidden information identified in the study.
The type of information most commonly exposed, as a percentage of the total documents analyzed, ranks as follows:
* 45 percent contain Author History - a list of user names of individuals who have opened and saved the document. These names are in addition to the author name found in the properties summary field, and cannot be seen through Word's interface.
* 37 percent of the files included a path associated with the user name indicating where the file was stored on a user's system.
* 31 percent contained printer information, which is the name of the default printer associated with the author's system. Eighteen percent of the files include printer information that also exposes a network share name.
* 14 percent of the documents included both an author history and an associated network share name where the document was stored at some point in its lifecycle.
* 17 percent expose "outlook properties," which are custom properties that include a user's email display name, email address and the subject line from the email that included the file as an attachment.
* 14 percent of the files were PowerPoint presentations that included speaker notes.
* 10 percent contained fast save data - text from Word files and PowerPoint presentations that have been deleted (no longer visible through the application interface), but which are still part of the electronic file.
"This study raises a number of questions. The most obvious being whether organizations and individual users really understand what information is being shared when they distribute, email or publish an Office file," said Keslin. "By performing analysis on a well defined collection of files, we've been able to quantify this issue beyond the occasional incident that ends up in the press."
He added that that the high rate of sensitive information exposure among Fortune 100 companies is alarming considering the significant IT resources available to them.
"I suspect this problem is more severe for smaller companies that don't have the resources or processes to review the information that is made available to the public. Further, we've only looked at files that were available to anyone who visits a Fortune 100 website. I can only imagine what we'd find if we inspected files that are shared with third parties via email, posted to partner extranets or employee portals."