As readers of this blog will know, I am a strong advocate of online privacy… That may sound strange coming from a web analytics evangelist. However, if we as an industry do not sort these privacy issues out, there is a real danger that web analytics as we know it today will disappear completely.

So, following the recent excellent post from Phil Kemelor on The FTC Privacy Report, “Do Not Track” Options and Web Analytics, I wanted to also add my take here…



I was disappointed with the FTC Privacy report for only tackling the issue of Personally Identifiable Information (PII). To my knowledge, all developed countries have good data protection laws on this already. Essentially, this means, “you” can only store PII data with the explicit person’s permission, and you must reveal this to the person concerned should they request it. See for example the UK Data Protection Act.

What I was hoping for from the FTC, was a position on non-PII data collection. That is, collecting data that does not DIRECTLY identify the individual. I emphasise directly, because with so many web data points available from an anonymous user, it is possible for an organisation to “triangulate” non-PII data and build up a pretty sophisticated profile of the person – ultimately identifying them.

A classic case of this happening was the AOL data scandal of 2006. This involved the release of a large volume of “anonymised” search query data, intended for research purposes, that NYT journalists (and others) were able to analyse and subsequently identify people with.

Track in Aggregate

I think (hope!) web users are pretty savvy when it comes to sharing their PII data on the web – in the same way you wouldn’t share your PII with a stranger in the street.

Tracking individuals as “individuals” on the web (as opposed to in aggregate), even when anonymous, poses a greater privacy threat – as it is unregulated. As more people realise this, we could reach a critical mass of people blocking vendor tools such as Omniture, Google Analytics, Coremetrics etc., to the point where the data is so unrepresentative that it is meaningless.

The answer is to only* track your web visitors in aggregate - that is, looking at metrics that represent a segment/group, rather than an individual. In that way, an individual can never be identified. Yes, this is a compromise. Individual data is much more interesting to marketers – “we could target a potential customer with laser-like precision“. But in reality this rarely happens – after all, the visitor is still anonymous, which means there is still a lot of guess work to be done with your laser.

IF (and its a big if), all web analytics reporting was conducted in aggregate, I feel the privacy fear many people have with web analytics, would all but disappear – safe guarding our industry long into the future. That’s a very large up-side compared to the very small down-side of not having individual visitor level data.

As always, I would be interested in your comments on this subject…

*If a visitor is an existing customer, subscriber or previously given you their PII, then of course tracking them as an individual makes sense – so long as they “identify” themselves each session i.e. log in.

Related from arrticle from Vicky Brock of the WAA: