Another Good Reason To Use The Google Analytics API
Mar
16
The online interface offered by Google Analytics offers a wealth of easily-accessible, business-oriented data and reporting about the usage of your website. But it has it’s limitations. Recently, when attempting to segment branded and non-branded keyword traffic for one of our larger clients (which, by the way, is a good idea for anyone who wants to understand whether their SEO efforts are gaining traction), we ran into one such limitation. Here’s the scenario: in order to capture all of the possible brand-related terms associated with our client, we created regular expressions matching variations of each (20 in all), and plugged them into two Advanced Segments in Google Analytics (one segment to exclude those terms, and one to match only those terms). After quite a bit of work (identifying the branded terms, writing the RegExes, creating the Advanced Segments, etc.), we clicked the Advanced Segments “on” with eager anticipation. And voila! Here’s the spiffy chart that appeared:

Hooray! Just what we wanted – visitor trends by branded and non-branded keyword traffic! But wait, what about that little yellow box of fine print? It reads: “This report is based on sampled data. Learn more.” So what exactly does that mean to us? Check out the data table that Analytics presented us with:

The numbers for branded and non-branded traffic *should* add-up to 100% of the visits… but they don’t. See those little yellow boxes next to the segmented data? Those depict the margin of error that Google’s data sampling resulted in. Basically, the statement “This report is based on sampled data” means that the numbers aren’t as precise as we might like them to be. In fact, the margin of error on some of our results was over 70%! Kind of a big deal…
After a little experimentation, it became clear that neither the size, nor the complexity, of my Advanced Segments were triggering the use of sampling. Instead, it turns out that the number of visits contained in the selected date range was the determining factor – specifically those instances where the total visits exceeded 500,000. In order to return reports for large data sets quickly, Google employs sampling whenever it generates a report through the online interface that isn’t automatically compiled.
The take home message? If you’ve got large volumes of Analytics data that you want to slice-and-dice, you might be better served to pull the raw data from Google using the Analytics API and perform the calculations on your own!













