Feedback Form

Hosted v Software v Hybrid tools

Categories: GA specific, General web analytics, Urchin software specific Your Comments 7 »

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 4.8 out of 5)
Loading ... Loading ...

My colleague Avinash recently presented at SES San Jose his thoughts on the current vendor space including: Visual Sciences, Omniture, IndexTools, Clicktracks, WebTrends and Google Analytics. As always, his talks are very engaging and thought provoking. For me though, one slide really stood out - the idea that a HYBRID web analytics tool can’t hunt - you need to view his presentation to follow that, but essentially the analogy is that HYBRIDs are not good as a web analytics tool. As Avinash knows, I disagree with this point of view, so I wanted to explain why here.

By HYBRID tool, what is generally meant is the combination of the page tagging technique combined with logfile data to produce cookie fortified logfiles. This was discussed in a white paper before I joined Google - Web Analytics Data Sources. There are significant advantages to doing this as shown in the diagram below. Essentially a hybrid allows you combine the benefits of both techniques to give you the most complete picture of visitor activity on your web site.

Hosted v Software v Hybrid tools

 

Key HYBRID benefits over and above a page tag only system include:

  • You own the collected data in the most direct sense of the word and can therefore reprocess it at will
  • Being able to track search engine robot activity
  • All downloaded files are tracked automatically without any modification of page html content
  • Partial file downloads can be tracked e.g. partial views of PDF files
  • Error pages can be tracked automatically without any modification of page html content

So a HYBRID technique offers real benefits. However, “with such great power comes great responsibility” (Spiderman!) which for a HYBRID web analytics tool means you take responsibility for:

  • Applying HYBRID software updates
  • Archiving and compressing your logfiles (which get very large very quickly)
  • Protecting end-user privacy - you have a legal responsibility to protect the privacy of your visitors and store logfile data securely.

HYBRIDS require a significant IT investment to run smoothly, which many organisations struggle to justify - hence the proliferation of page tag technique adoptions. Nonetheless, a HYBRID method remains an effective technique for improving the accuracy of either a page tag or logfile solution.

Are you using (or have used) a HYBRID method or perhaps some other technique to improve accuracy? Share your thoughts with a comment.

Tracking links to direct downloads - Automatically

Categories: GA Hacks, GA specific Your Comments 16 »

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 5 out of 5)
Loading ... Loading ...

My standard word of caution - This is a tech tip and requires you to have a knowledge of html and javascript to implement and use it…
GA Hacks

[Update 03-Nov-2008: This hack is for the legacy urchin.js tracking code.
Always refer to the Scripts & Downloads section for the latest version.
]

Following on from my previous post Tracking banners and other outgoing links automatically, this GA hack allows you to track downloads automatically. As you may know, tracking download files such as PDF, EXE, DOC and XLS can be achieved quite easily with the modification of the link to include an urchinTracker call to log a virtual pageview. However, as for tracking outgoing links, manually modifying each download link becomes inefficient when there are large numbers of ever changing files to track. You can overcome this by applying the JavaScript code below:

<script type="text/JavaScript">
// Only links written to the page (already in the DOM) will be tagged
// Script can be called multiple times

function addExtDocEvents() {
var as = document.getElementsByTagName("a");
var extDoc = [".doc",".xls",".exe",".zip",".pdf",".js"];
// add further document types as required

for(var i=0; i<as.length; i++) {
	var tmp = as[i].getAttribute("onclick");

// Tracking electronic documents - doc, xls, pdf, exe, zip
if (tmp != null &#038;&#038; tmp.indexOf('urchinTracker') > -1) continue;
for (var j=0; j<extDoc.length; j++) {
if (as[i].href.indexOf(extTrack[0]) != -1 &#038;&#038;
as[i].href.indexOf(extDoc[j]) != -1) {
		var splitResult = as[i].href.split(extTrack[0]);
		as[i].setAttribute("onclick",((tmp != null) ? tmp : "") +
			"urchinTracker('/downloads" +splitResult[1]+ "');");
		break;
	}
}
}
}
addExtDocEvents()
</script>

The script works by looking for links within the browser’s Document Object Model (DOM) that match the file extension given in the variable array extDoc. If so it is modified to include the urchinTracker call. By this method, all file downloads will be reported as:

/downloads/the-url-that-is-clicked-on

Where the-url-that-is-on clicked on is minus ‘http://’. You can modify the JavaScript to adjust the path as required,

IMPORTANT: As for the Tracking of banners and other outgoing links, the position of this code within your page is important. This must placed after your call to the GATC. Alternatively you can place the addExtDocEvents() call in an onLoad event handler and host the JavaScript in a separate file. As an example I show this below, assuming the javascript is hosted in a file called trackExternal.js, as follows:

<script src="http://www.google-analytics.com/urchin.js" type="text/JavaScript">
</script>
<script src="/trackExternal.js" type="text/JavaScript"></script>
<script type="text/JavaScript">
	_uacct = "UA-XXXXX-Y";
	urchinTracker();
</script>

<body onLoad=”addExtDocEvents()”>
	...your remaining web page content...
</body>

A note on performance: Each time your page loads, this script will go through all links referenced on the page to see if it is for a download. Clearly the more links on your page, the harder the script must work. As long as the number of links on each page number in the hundreds and not thousands, performance should not be a problem. Also, pages with a large number of links, it is possible that visitors will click on a download link before the script has modified it. The result is that click through will not be tracked by Google Analytics.

***Update***
I have now combined this hack for downloads with the Tracking of banners and other outgoing links - automatically into a single file that is available in the Scripts & Downloads section.

Did you find this tip useful? All tips are being grouped under the category GA Hacks. Please provide your feedback with a comment.

What is the 2nd thing to do when considering a web analytics implementation?

Categories: GA Implementation ABCs, GA specific Your Comments 4 »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

What came first?

[This article is part of a series entitled: GA Implementation ABCs]

During my first post of this series What is the 1st thing to do when considering a web analytics implementation?, I discussed how important simply getting initial data in was - before tackling the much wider (and also more complex) issues of mapping your stakeholders, building your KPIs or assessing your business needs from your web site. Essentially, my view is: get an initial feel for the project - get the data in and that means tag all your site pages (including the tracking of non-standard pages such as PDFs, EXE, ZIP etc).

With data coming in, the 2nd thing to consider is adding filters. Filters in Google Analytics have many purposes such as segmentation and report augmentation. In this post I focus on their role in data cleansing. Keeping the data ‘clean’ means removing visits that are not wanted or are not valid visits. Essentially considering these as improving the signal-to-noise ratio of your data. Having clear signals means you don’t waste time analysing what could be random events (noise) on your web site.

Example cleansing filters include:

  1. Your own access to your web site
    - this can be a significant volume of non-converting traffic if your employees set their browser opening page to be the company web site. Such visits will over inflate your visitor and pageview counts and decrease your conversion rates.
  2. Your web developers/designers updating content
    - these can be significant in volume but more importantly, web developers are likely to update conversion pages, triggering goals and over inflating your conversion rate metrics.
  3. Data contamination
    - other web sites copying your GATC either deliberately of accidentally which results in meaningless data being mixed with your web site visit data.

All 3 of these should be removed by adding 3 filters to your GA configuration as follows:

Filter 1: removing yourself from the reports

Excluding known visitors is very straight forward. If visitors connect to the Internet via a fixed IP address, you simply select the predefined filter ‘Exclude All Traffic from an IP Address’ from the Filter Manager as shown:

filter to exclude an ip address from Google Analytics

Excluding visits from employees, your search marking agency or any known third party, such as your web developers, is an important step when first creating your profiles. These visitors generate a relatively high number of pageviews in areas that will greatly impact key metrics - such as your conversion rates. For example, employees with their browser home page set to the company web site will show in your reports as retuning visitor every time they open their browser - and most likely a one-page visitor. Remember the GATC deliberately breaks through any caching so it’s important to exclude employees from visits from potential customers.

Similarly web developers heavily test checkout systems for troubleshooting purposes. These will also trigger GATC page requests and most likely these will be for your goal conversion pages. You should therefore remove all such visits from your reports.

Filter 2: removing your designers/developers from the reports

This simply an extension of Filter 1, using the ip address of your agency in place of your own office. But what if ip addresses change each time they log in? I will discuss this scenario in a later post. However, for the vast majority of business broadband lines, fixed ip addresses are used, so you should be ok.

Filter 3: removing any contaminated data
This filter is to ensure that your data, and only your data, is collected into your Google Analytics profile. For example, it is possible for another web site owner to copy your GATC onto their own pages - therefore contaminating your data with their own web site traffic. The simple include filter shown below applied to your Google Analytics profile will ensure only traffic to the mysite.com domain is reported on.

filter to include only your own web site traffic in Google Analytics

Of course it may be desirable to collect data from multiple web sites into one profile. In that case, add the multiple domains in the Filter Pattern separated with a | character, for example:

Filter Patern: mysite\\.com|yoursite\\.com

Important tip:
It is important to note that when a filter is created within Google Analytics, it’s immediately applied to new data coming into your account. New filters will not affect historical data, and it is not possible to reprocess your old data through the new filter. Therefore, always keep “raw” data intact - that is, keep your original web site profile and apply new filters to a duplicate profile in your account.

How have you approached the signal-to-noise ratio problem? The vast majority of Google Analytics installations I come across have no filters applied, why is this? Please add your thoughts with a comment.

Customising the list of recognised search engines

Categories: GA Hacks, GA specific Your Comments 9 »

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...

GA HacksMy standard word of caution - This is a tech tip and requires you to have a knowledge of html and javascript to implement and use it…

[Update 03-Nov-2008: This hack is for the legacy urchin.js tracking code.
For the ga.js version read: Customising the list of search engines in Google Analytics.
]

Google Analytics shows which search engine your visitors/customers have used in the Traffic Sources > Search Engines report. To view the list of all the search engines that Google Analytics currently identifies by default, simply load into your browser http://www.google-analytics.com/urchin.js. In this file you will see the section commented as:

//-- Auto/Organic Sources and Keywords

This section is where the organic search engines are defined that once captured by the Google Analytics Tracking Code (GATC) will be reported in the reports interface. By looking at this section, you will notice that the current number of organic search engines detected by default is 28 i.e. _uOsr[0] to _uOsr[27].

Of course Google recognises that there are a great deal more search engines in the world – language and regional specific as well as niche search engines such as price comparison and vertical portals. It is therefore possible to modify and append the array of recognised search engines and there are two methods.

1. The standard method of adding additional search engines to GA
Add the following code to your page GATC:

_uOsr[28]="search_engine_name";
_uOkw[28]="query_variable"; 

The value for _uOsr is the domain name (sub-domain or part of the domain to match) of the search engine and the value for _uOkw is the query variable which stores the keyword (replace search_engine_name and query_variable in the example above).

The number in square brackets should start at 28, or one more than what ever the last number is if the number has been updated, and increase in increments of 1 for each additional search engine added (29, 30, 31, etc).

For example, if someone searched for “motorcycle” and the search result URL is:
http://search.bbc.co.uk/cgi-bin/search/results.pl?q=motorcycle

you would add the following line to your tracking code on your pages:

<script src="http://www.google-analytics.com/urchin.js" type="text/JavaScript"></script>
<script type="text/JavaScript">
     _uacct = "UA-xxxx-x";
     _uOsr[28]="BBC"; _uOkw[28]="q";
     urchinTracker();
</script> 

Another example where this technique is useful is to add price comparison engines as a regular search engine:

<script type="text/JavaScript">
     _uacct = "UA-xxxx-x";
     _uOsr[29]="Kelkoo"; _uOkw[29]="siteSearchQuery";
     urchinTracker();
</script> 

By this method, Kelkoo would be listed in the Search Engine Marketing report along side other search engines. That is useful in itself, but what provides more insight is the corresponding Kelkoo search terms used by visitors will also be listed in the Keywords report. Without this little hack, Kelkoo would simply be listed as a standard referrer and no search terms would be logged.

Apart from adding additional search engines to the existing list provided by Google Analytics, you could also use this method to create more regional ones of the main players. For example, if you are based in the UK, being able to differentiate google.co.uk from google.com may be of importance. In which case you would add the following to your GATC of each page:

<script type="text/JavaScript">
     _uacct = "UA-xxxx-x";
     _uOsr[0]="google.co.uk"; _uOkw[0]="q";
     urchinTracker();
</script>

Note: When adding regional variations to the search engine list, the order or the _uOsr and _uOkw arrays become important. So google.co.uk must be listed before the “catch-all” match of google. That of course requires the re-numbering of the search engine list array.

2. A more robust method…
An example of a more complete and robust method of adding additional search engines to the list is shown at: www.advanced-web-metrics.com/scripts/custom_se.js. In this case, the list of custom/localised search engines is kept in a separate JavaScript file and referenced in each page within the GATC as follows:

<script src="http://www.google-analytics.com/urchin.js" type="text/JavaScript"></script>
<script src="http://www.mysite.com/custom_se.js" type="text/JavaScript"></script>
// custom_se.js must be called after urchin.js

<script type="text/JavaScript">
	_uacct = "UA-xxxx-x";
	urchinTracker();
</script>

This script overwrites the default search engine array of Google Analytics and uses the array length (_uOsr.length) to increment its index so that re-numbering is not required when adding new entries. A sample of the code is provided below:

var _uOsr=new Array();
var _uOkw=new Array();

// Google EMEA Domains
_uOsr[_uOsr.length]="google.co.uk";	_uOkw[_uOsr.length]="q";
_uOsr[_uOsr.length]="google.es";	_uOkw[_uOsr.length]="q";
_uOsr[_uOsr.length]="google.pt";	_uOkw[_uOsr.length]="q";
_uOsr[_uOsr.length]="google.it";	_uOkw[_uOsr.length]="q";

… etc.

So by this method, you simply maintain a separate file for your list of search engines and you don’t need to worry about renumbering each time you update/append. Feel free to copy and use the one listed: http://www.advanced-web-metrics.com/scripts/custom_se.js.

Many thanks for the guys at GA-Experts.co.uk for the help with building and testing the custom_se.js file.

Did you find this tip useful? All tips are being grouped under the category GA Hacks. Please provide your feedback with a comment.

Tracking banners and other outgoing links - Automatically

Categories: GA Hacks, GA specific Your Comments 22 »

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5 out of 5)
Loading ... Loading ...

ga-hacks.gifA word of caution - This is a tech tip and requires you to have a knowledge of html and javascript to implement and use it…

[Update 03-Nov-2008: This hack is for the legacy urchin.js tracking code.
Always refer to the Scripts & Downloads section for the latest version.
]

Your site may offer a visitor a link to click through to an external web site such a subsidiary, an affiliate, advertiser or an trade organisation. For Google Analytics, a visit leaving your web site requires an edit to the page in order to track it. This is achieved by modifying your outbound links to call urchinTracker and is extremely easy to do. However, what if your web site has hundreds of separate outgoing links that are constantly evolving and being appended to? Clearly manually modifying each link becomes laborious and inefficient. To overcome this you can apply the example JavaScript code below to your web site:

<script type="text/JavaScript">
// Only links written to the page (already in the DOM) will be tagged
// Script can be called multiple times

function addExtLinkerEvents() {
var as = document.getElementsByTagName("a");
var extTrack = ["mysite.com"];
// replace mysite.com with your web site domain

for(var i=0; i<as.length; i++) {
	var tmp = as[i].getAttribute("onclick");
	// Track links off site – i.e. no GATC
	if (tmp != null &#038;&#038; tmp.indexOf('urchinTracker') > -1) continue;
	for (var j=0; j<extTrack.length; j++) {
		if (as[i].href.indexOf(extTrack[j]) == -1) {
			var splitResult = as[i].href.split("//");
			as[i].setAttribute("onclick",((tmp != null) ? tmp : "") +
			  "urchinTracker('/ext/" + splitResult[1]+ "');");
			// the above must be on one line - from as[i]... to ");
			break;
		}
	}
}
}
addExtLinkerEvents()
</script>

The script works by looking for links within the browser’s Document Object Model (DOM) that do not match the domain value given in the variable array extTrack. If the link does not match extTrack then it is considered an external link and so is modified to include the urchinTracker call. By this method, all external links will show in the Google Analytics reports as:

/ext/the-url-that-is-clicked-on

Where the-url-that-is-on clicked on is minus ‘http://’. You can modify the JavaScript to adjust the path as required,

IMPORTANT: The position of this code within your page is important. The code must be placed after your call to the Google Analytics Tracking Code (GATC). Alternatively, you can place the addExtLinkerEvents() call in an onLoad event handler and host the provided JavaScript in a separate file. As an example I show this below, assuming the javascript is hosted in a file called trackExternal.js, as follows:

<script src="http://www.google-analytics.com/urchin.js" type="text/JavaScript">
</script>
<script src="/trackExternal.js" type="text/JavaScript"></script>
<script type="text/JavaScript">
	_uacct = "UA-XXXXX-Y";
	urchinTracker();
</script>

<body onLoad=”addExtLinkerEvents()”>
	...your remaining web page content...
</body>

A note on performance: Each time your page loads, this script will go through all links referenced on the page to see if it is external. Clearly the more links on your page, the harder the script must work. As long as the number of links on each page number in the hundreds and not thousands, performance should not be a problem.

Also note that for pages with a large number of links, it is possible that visitors will click on an external link before the script has modified it. The result is that click through will not be tracked by Google Analytics which is an accuracy consideration that effects all web analytics vendors.

Did you find this tip useful? I am considering writing more of these tech tips if you feel they are useful. Please provide your feedback with a comment.

Copyright Advanced Web Metrics by Brian Clifton | Privacy | Contact:
Post Feed Comments Feed Log in