Keeping a local copy of your Google Analytics data can be very useful for your organisation. For example, Google currently commits to keeping GA data for 25 months allowing you to compare year on year reports. That is adequate for most users, but what if you wish to go further back? Wouldn’t it be useful to retain a local copy of the collected data?
Benefits of keeping a local copy of GA visitor data
What you can do with your local copy of your data:
- Greater control over your data
- Troubleshoot GA implementation issues
- Process historical data as far back as you wish - using Urchin
- Re-process data when you wish - using Urchin
The simple way to keep a local copy, is to use the _userv variable in your Google Analytics Tracking Code (GATC) as follows:
_userv=2;
By setting this variable, GA visitor data is simultaneously streamed to your web server logfiles in addition to being sent to Google Analytics for processing. A complete GATC to backup your GA data locally would look something like this:
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript"></script>
<script type="text/javascript">
_uacct = "UA-XXXXX-1";
var _userv=2;
urchinTracker();
</script>
This is simple to achieve as all web servers log their activity by default, usually in plain text format. Once implemented, open your logfiles to verify the presence of additional utm.gif entries that correspond to the visit data as ’seen’ by Google Analytics.
A typical Apache logfile line entry (line wrapped here) looks like this:
86.138.209.96 www.mysite.com - [01/Oct/2007:03:34:02 +0100] "GET /__utm.gif?utmwv=1&utmt=var&utmn=
2108116629 HTTP/1.1" 200 35 "http://www.mysite.com/pageX.htm" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)" "__utma=1.117971038.1175394730.1175394730.1175394730.1;
__utmb=1; __utmc=1; __utmz=1.1175394730.1.1.utmcid=23|utmgclid=CP-Bssq-oIsCFQMrlAodeUThgA|
utmccn=(not+set)|utmcmd=(not+set)|utmctr=looking+for+site; __utmv=1.Section One"
[GA-Experts.co.uk has a best practice guide on configuring an Apache logfile format]
For Microsoft IIS, the format (line wrapped) can be as follows:
2007-10-01 01:56:56 68.222.73.77 - - GET /__utm.gif utmn=1395285084&utmsr=1280x1024&utmsa=1280x960
&utmsc=32-bit&utmbs=1280x809&utmul=en-us&utmje=1&utmce=1&utmtz=-0500&utmjv=1.3&utmcn=1&utmr
=http://www.yoursite.com/s/s.dll?spage=search%2Fresultshome1.htm&startdate=01%2F01%2F2010&
man=1&num=10&SearchType=web&string=looking+for+mysite.com&imageField.x=12&imageField.y=6&utmp
=/ 200 878 853 93 - - Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.0.3705;
+Media+Center+PC+3.1;+.NET+CLR+1.1.4322) - http://www.yoursite.com/
In both examples, the augmented information applied by the GATC is the addition of utmX name value pairs. This is known as HYBRID data collection - the benefits of which are discussed in the post Software v Page Tags v HYBRIDS.
The benefits explained in detail
1. Greater control over your data
Some organisations simply feel more comfortable having their data sitting physically within their premises and are prepared to invest in the IT resources to do so. Of course you can not simply run this data through an alternative web analytics vendor, as the GATC page tag information will be meaningless to anyone else. However, it does give you the option of passing your data to a 3rd party auditing service such as ABCE. Audit companies are used to verify web site visitor numbers - useful for content publishing sites that sell advertising and therefore wish to validate their rate cards.
NOTE: Be aware that when doing this, protecting end-user privacy (your visitors) is your responsibility and you should be transparent about this in your privacy policy.
» Read the rest of this entry »