An increasing number of websites use scripts that record many aspects of user behaviour through browsers, creating a substantial privacy threat, researchers have found.
Researchers Steven Englehardt, Gunes Acar and Arvind Narayanan studied seven companies that provide so-called session replay services for websites: Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam.
Session replay scripts are utilised by websites to improve user experience, but go well beyond normal website analytics when it comes to collecting data from users, the researchers found.
"These scripts record your keystrokes, mouse movements, and scrolling behaviour, along with the entire contents of the pages you visit, and send them to third-party servers," they wrote.
"Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder."
Once a user session has been recorded, publishers can view it through a dashboard page at the service provider; in some cases, the replay is sent over the clear-text HTTP protocol, even though the user session originally took place over the secure and encrypted HTTPS, creating another information leak risk.
There is no indication to users that such detailed collection of their interactions with sites and browsers is being collected.
Session replay services were found on 482 of the top sites in web metrics company Alexa's index, the researchers said.
A range of highly sensitive information can be captured from users. The researchers said this could include data such as medical conditions, credit card details, and other personal information displayed on web pages.
Users could be exposed to identity theft, online scams and other threats by the session replays.
Data collected through session replay is visible to site publishers, with some providers explicitly linking the information gathered to users' identities.
While the session replay services attempt to redact some data such passwords, some sites with mobile friendly login boxes could leak this information.
The redaction of sensitive user input is only partial and imperfect, the researchers found.
Tracking protection such as Do Not Track (DNT) set up by users is often not honoured by publishers, and ad-blockers often don't block the session replay scripts, the researchers found.