Google Analytics    Yahoo Web Analytics     SiteCensus

Last year I wrote an whitepaper on web analytics accuracy. The intention of this was to be a reference guide to all the accuracy issues that on-site web measurement tools face, and how you can mitigate the error bars. Apart from updating the article recently, I wanted to illustrate how close (or not) different vendor tools on the same website can be when it comes to counting the basics – visits, pageviews, time on site and visitors.

To do this, I have looked at two very different web sites with two tools collecting web visitor data side by side:

  • Site A – This blog, running Google Analytics and Yahoo Web Analytics. According to Google, there are 188 pages in the Google Index and traffic is approximately 10,000 visits/month
  • Site B – A retail site that runs Nielsen SiteCensus and Google Analytics (site to remain anonymous). According to Google, there 12,808 pages in the Google Index and traffic is approximately 1 million visits/month

These are obviously two very different web sites with two different objectives…

Site A is relatively small in content and visit volume, with the purpose of driving engagement i.e. readership and their interaction via comments, ratings, click throughs etc. For this site, I had complete control of the deployment of both web analytics tools. This enabled me to have a best practice deployment of both Google Analytics and Yahoo Web Analytics.

Site B is approximately 100x larger in terms of content and traffic. Its main objective is to drive sales via online transactions. For this site, I had complete control of the Google Analytics implementation, with no control of the SiteCensus implementation as this was before my time. However, as the tool was professionally installed, I am assuming it was a best practice deployment.

Both tools use first party cookies for visitor tracking.

Results

I analysed the reports for August, September, October 2008 and took the average of the reported differences between the two tools for the following 5 metrics:

  • Visits – also known as the number of sessions
  • Pageviews – also known as the number of page impressions
  • Pages/visit – the average number of pageviews per visit
  • Time on site – the average time a visit lasts
  • Unique visitors

Comparing each of the three months separately was preferred in order to mitigate any outliers of simply comparing a longer 90 day period. A monthly interval also reduced the effects of cookie deletion for tracking unique visitors – the longer the time period, the greater the chance the original ID cookie being lost or deleted and therefore the inflation of unique visitors. The approach was validated by selecting weekly comparisons at random. These showed almost identical differences as the monthly comparisons. The results are shown in Tables 1 and 2.

Table 1: Google Ananlytics versus SiteCensus

August September October

Google Analytics
Nielsen SiteCensus
Google Analytics
Nielsen SiteCensus
Google Analytics
Nielsen SiteCensus
Average difference
Visits 1.18 1.09 1.08 +11.7%
Pageviews 1.22 1.10 1.11 +14.3%
Pages/visit 1.03 1.02 1.02 +2.3%
Time on site 1.13 1.12 1.12 +12.3%
Unique visitors 1.09 1.02 1.02 +4.3%


Table 2: Google Ananlytics versus Yahoo Web Analytics

Google Analytics
Yahoo Web Analytics
Google Analytics
Yahoo Web Analytics
Google Analytics
Yahoo Web Analytics
Average difference
Visits 0.95 0.95 0.98 -4.0%
Pageviews 1.07 1.07 1.11 +8.3%
Pages/visit 1.12 1.12 1.13 +12.3%
Time on site 1.25 1.46 1.18 +29.7%
Unique visitors 0.95 0.96 0.99 -3.7%


Observations & Comments

For both implementations of web analytics there is a relatively low variance (spread of values) in the reported metrics.

Site A   

  • Google Analytics reported slightly higher numbers than Nielsen SiteCensus for all 5 metrics. The largest difference for any one month is a 22% higher pageview count as measured by Google Analytics.
  • On further investigation, it was discovered that page tagging of SiteCensus was not 100% complete. That is, some pages were missing the data collection tags for SiteCensus*. This was due to the use of multiple content management systems subsequent to the initial implementation. Therefore metrics would be under counted by SiteCensus compared with the more complete (and verified) Google Analytics deployment.*It was estimated that approximately 6% of pages were missing the SiteCensus page tag, though how many monthly pageviews this related to is unknown. The checking of page tags was achieved using the methods discussed in Troubleshooting Tools for Web Analytics.

Site B

  • Google Analytics reported slightly lower numbers than Yahoo Web Analytics for the number of visits and pageviews and slightly higher numbers for the other metrics. The variance is also low, except for the time on site, which appears significantly higher when reported by Google Analytics – as much as 46% higher than for Yahoo Web Analytics.
  • As both tools where verified as having a best practice deployment of page tags, the conclusion was that the large differences for the time on site must be due to the different ways the respective tools calculate this metric.For Yahoo Web Analytics, the time on site is calculated as: time of last pageview – time of first pageview

    For Google Analytics, the time on site is calculated as: time of last hit – time of first hit

    A Google Analytics “hit” is the information broadcast by the tracking code page tag. This can be any of the following: a pageview, a transaction item, the setting of a custom variable, an event (e.g. file download), or the clicking of an outbound link.

    For Site B, custom variables, PDF file downloads and outbound links are all tracked with Google Analytics and are significant proportions of visitor activity. These were not being tracked with Yahoo Web Analytics. In particular, PDF file downloads are thought to have the highest impact on the calculated time on site, as a visitor is likely to pause to read these before continuing to browse the web site. File downloads are therefore likely to account for the larger discrepancies of this metric.

Conclusions

The methodology of page tagging with JavaScript in order to collect visit data has now been well established over the past 8 years or so. Given a best practice deployment of Google Analytics, Nielsen SiteCensus or Yahoo Web Analytics, high level metrics remain comparable. That is, can be expected to lie between 10-20% of each other. This is surprisingly close given the plethora of accuracy assumptions that need to considered when comparing different web analytics tools.

As tracking becomes more detailed – for example the tracking of transactions, custom variables, events and outbound links, the greater the discrepancies of metrics will be between the web analytics tools.

Extrapolating this study to all page tag web analytics tools using first party cookies (if not first party cookies, why not?), high level metrics from different web analytics tools should be comparable. The large caveat is having the knowledge of whether a deployment/implementation of the tool follows best practice recommendations. Often this is the greatest limitation, which is why so few comparison studies are available.

Comment

Accuracy in the web analytics world is still a hot debate, though for the wrong reasons in my view.

For a business analyst, reporting a 10-20% error bar when comparing two measurement tools may sound a large. However it pails into insignificance when compared to the vagaries of tracking offline marketing metrics such has newspaper readership figures, or TV viewing data. This is especially so considering the link between readership/viewing figures and any actual “engagement” with the associated advertising is so tenuous.

Are you loosing the battle with marketing teams obsessed with “uniques” or winning the war of tracking trends and KPIs? I would be interested in your comments.