Skip page content

Archive for August, 2008

Personas and Google Analytics

Andrea Wiggins recently posted an excellent article on her blog on “Data-backed Personas,” in which she discusses using Google Analytics data as input to persona design. As someone who appreciates the relative confidence numbers provide, I found the article (and extensive related discussion, including on Jared Spool’s User Interface Engineering blog) to be enlightening regarding the value and methodology behind personas.

The article inspired me to think more about how personas can interface with web tracking like Google Analytics. The author suggested architecting the GA account to contain separate new and returning visitor profiles, then using the data within the profiles as input to persona creation, including geographic considerations, as well as visitor needs via segments like keywords and landing pages. I immediately think of the converse equation, (or continuation of the process) – using the personas to structure the GA account once they have been created.

The ability to customize reporting based on business goals is one of the key values Google Analytics can add to a company. But often details regarding the structure of an account are overlooked, with the assumption that the end reports will provide the data exactly as needed. But GA is clunky is the respect that sometimes the particular segments you need to cross-reference can’t be easily displayed in a single report.

After perfecting the personas, designing profiles around a few basic attributes of each via fairly straightforward Custom Filters, such as Geographic Region and connection speed, could extend the life of the personas into the analysis beyond the web design phase. Which can greatly aid communication between the consultant and business stakeholder by following consistent threads into the conversation. For a company on their own that lacks a devoted web analyst, having profiles that correlate specifically with user-types can aid in trouble-shooting, not to mention avoid delving in to a more complicated configuration. When your time with the tool is limited, it can be easy to emerge from an analysis session without a clear takeaway from what you saw. The memorable-ness of a persona and ability to compare, for instance, the top content, or time spent on various pages, between two different personas could help the company make more efficient use of their time.

This is not to say that these should be the only profiles! A comprehensive profile, as well as others designed around organizational needs, is in order too.

How can your landing page convert visitors that don’t care?

Seth Godin posted a couple of thought-provoking blog posts recently about online ads: Ads are the new online tip jar and a follow-up, Beating the status quo.

What Godin proposes is, when you read a blog, “if you like what you’re reading, click an ad to say thanks.” If everybody engaged in this behavior, it would over time change the model of online advertising so that ads would pull in more people but they would be less well qualified.

He writes that your landing page gives you the opportunity to “immerse someone in an entire page you designed. In other words, a chance to convert mild interest into big interest.”

This is the part that is particularly interesting to me. The idea of landing pages (at least at this time) is it fulfills the promise made in the ad the user just clicked on. The ad is meant to pull in only people that are interested in learning more. The landing page, in turn, is meant to speak to people who found the ad interesting and want to learn more about the product that was advertised or finding a solution to a problem they experience. Godin’s vision entails people clicking on ads that they don’t actually care about.

We would be challenged to put together landing pages that speak to the casual visitor. We would have to grab the attention of these less-motivated visitors immediately to keep them from leaving. When talking about the product or service, different messages will appeal to different people, but the more messages you try to pack into a landing page, the harder it will be to get through to visitors.

The answer, then, may be to try to suck in visitors through content that is entertaining instead of presenting a purely informative page. Will that strategy work for every business, though? Perhaps the answer lies in crafting the message on a landing page to clearly and loudly tell visitors why this landing page is worth their time.

Godin’s idea intrigues me but I am not convinced. He proposes that the value of this new model to advertisers would be “begin[ning] to reach the unreachable non-clickers.” It will be hard to reach visitors whose motive for clicking on an ad is not to learn more about what you are advertising, but rather to generate income for the blogger whose work they enjoy.

Secret Google Ranking Document Posted Online

Most of my blog posts don’t get many hits. It’s not that surprising; I don’t really spend much time being thorough with the information I write about because I don’t have as much time as I’d like to write a post. I do try to be original, so I’m not repeating the same stuff everyone else is writing, but I just don’t explain every little thing and tend to glance over big topics. Well, I’m pretty bogged down with other things to do for work, so I’ve been keeping an eye out for pointers. Something light, fast, interesting; something to get me more hits than usual. What better place to look than a search engine for clues? So I was doing some work and snagged a snapshot of Yahoo’s home page from the olympics:

Yahoo! Featured Articles

whaaa . . . Gangsta synchronized swimmers!!! DO elephants hold grudges . . . even long term grudges!?! OF COURSE I WANT TO SEE WORST DRESSED CELEBRITIES!!!!!!!!

Well there you go, it doesn’t get anymore compelling than that. It’s a slam dunk for Yahoo. So here I am, Steve L, your average Joe Blogger, how can I put to work the methodology of Yahoo! to generate traffic for my own blog post?

Well, I feel like I’m on the shoulders of giants now, and for this post, I’d like to share with you a very special, very secret Google document I found. Tucked away in Google’s “Webmaster Guidelines” section, there’s an article explaining how webmasters can get a number 1 page ranking on Google. Not only that, Google leaked a document detailing the formula behind the search engine.

And if that’s not enough, you can see former Google spokesperson, Vanessa Fox, nude.

And, finally, here’s a picture of me, worst hair in the internet marketing industry, probably the worst dressed as well, but it’s hard to tell from the photo:

Steve L

An unscientific case study of online curtain-shopping

So, I had to buy some curtains last night. I thought I would buy them the old-fashioned way, by driving to a nearby store, but I discovered that the local store didn’t sell curtains in the dimensions I needed. I could get cafe curtains or long panels at the store, but nothing in between.

So, I decided to go online. No, I wouldn’t be able to touch the fabric, yes, it would mean I’d have to postpone my new-curtain gratification, but it seemed like online there would at least be a wider range of sizes to choose from. At least, that was my first assumption.

I decided to go with known vendors, so instead of typing curtains into Google and seeing what I found, I tried Sierra Trading Post (an odd choice, I know). Here’s what I found:
Sierra Trading Post - CurtainsThey had 43 results, that seemed promising. The problem is that I wasn’t looking for “curtains” – I was looking a pair of curtains to fit a 46″x53″ expanse of windows. So, I didn’t need floor length panels. And that was all I could see as I scrolled down the page. Worse, I wasn’t able to narrow the results to see if they had any curtains in the sizes I was interested in. I had to review each listing. OK, well, maybe an outdoor surplus website isn’t the place to buy home furnishings.

Sears.com Curtain ResultsNext, I tried sears.com and landsend.com. I did this because I have Sears gift cards I wanted to use up. Sears had 153 curtain listings on their site, but again, no ability to narrow results by size, so I would have had to page through 153 results to find the few, if any, that were available in the size I wanted. Again, most appeared to be panels for large French door type windows, not the kind I wanted. I was motivated by the magic of gift cards, though, so I dutifully checked the list. They did offer options to narrow by brand names, but I didn’t recognize the brands, and I was really only interested in curtain size, something that wasn’t available in the “narrow your search” options. I did find one curtain that might have worked, and I held it in my shopping cart, but I wasn’t sure it was the best option. For one, it seemed expensive (each curtain was $60 and I needed 4 of them…).

Curtains on LandsEnd.comI checked LandsEnd.com. Neither “curtain” nor “curtains” turns up anything in their search, but if you click through Home Accents, you can get to Curtains and Rods, which had nice looking stuff, but again, nothing in the size I wanted. Since there were only two choices, though, I didn’t mind paging through them.

Finally, I did what I should have done in the beginning, I did a Google search for curtains. I skimmed the organic and the sponsored listings. I couldn’t do JC Penney, the top listing, because I had the Sears card. It just didn’t seem right. I skimmed down to Overstock.com and clicked through their ad. Now, when I searched for “curtains” on Overstock, I got 11 pages, 60 results per page. And, the absolute best, I got a way to narrow my search by size. Exactly what I’d been struggling with on the previous sites. I found several pairs of curtains that matched my specifications, and while I was there, I recalled I had a few other things to buy. So, Overstock did right by me, search wise, and I ordered more than just curtains when I was there.
Curtains on Overstock.com
So in my completely unscientific nonrandom survey of 4 sites, overstock.com was the best online experience because it offered me a quick way to find what I needed.

Why is Flash hard on SEO?

It’s pretty common knowledge now that Adobe has started an initiative to provide automated flash indexing for the major search engines. (Google describes its involvement here). As exciting as the idea is, the consensus in the SEO community is that the ability to truly index Flash content isn’t here yet, and won’t be for a while.

These articles have great practical reasons (i.e., low rank results) for why Flash is bad for SEO. But what’s the big deal? All the content is now transparent; why can’t crawlers work through a Flash animation the same way a person does, identifying content and making it relevant?

The short answer is that search engine bots are pathfinders, and Flash movies don’t have good paths for them to follow or find. Pathfinding is the act of mapping out a defined space where a bunch of pages are linked to each other. Although there are some mathematical challenges to doing this efficiently, it doesn’t take a particularly smart agent to find all the paths on a website and how they interlink.

The structure of static websites lends itself to this approach. But Flash can, literally, create anything, from a website simulation to a computer game. Generally what flash developers do is something in between. How does a crawler know how to deal with a button that creates an “event”, such as the introduction of a sound, or perhaps a content snippet, in the scope of its hope to find paths across a network of pages?

It probably can’t. Maybe someday the crawlers will be smart enough to know that an event they “crawl” is really the jump button on a computer game, but in the meantime such information will just confuse and muddle things. Superficial attempts to address this issue–largely by providing “alternative” content to crawlers that mimic the content of a flash site–result in lower values by Google’s crawlers and create serious maintenance headaches, in the same way that managing multi-lingual sites doubles web team workloads.

So what do you tell your developers, who rightly love Flash for its versatility, portability, and relatively low programming effort? Three things:

  • If you must use flash, use it to tell a visual story, not a content one, such as this animation on Saline Lectronics to crate high-impact events without affecting copy.
  • Whatever you do, don’t use flash as the framework tool to build the site, and avoid embedding content in it. Use .css instead.
  • Build your site with a pathfinder philosophy. A good site is a series of related ideas connected by paths, each one of which should be a road to some related or similar level of understanding.

Think about your site this way, and your visitors will reward you. The crawlers will too!

The OS X One-size-fits-all Text Size

OS X has some fine accessibility features. For example, the ability to zoom in by holding down the control key and scrolling with the mouse is excellent. What it lacks is any way to increase the operating system’s text size.

The OS X Universal Access DialogSure, you can alter the size of icons and text on the desktop. That is all. If you’ve got, for example, poor eyesight and you want to find an item in the menu bar? Too bad. Need to sort out some files in a Finder window? Too bad. Want to configure some preferences in an application? Too bad. It is interesting that in the Universal Access dialog, the place where one can adjust various accessibility feature, the text is larger than normal. This is the only place in OS X with larger, easier-to-read text.

A visit to the Apple support discussion board makes it look like the only option is to decrease the screen resolution, thereby rendering everything larger and blurrier and removing any advantage one may get from having a screen capable of a high resolution. Alternatively, you can use the zoom functionality. That option decreases clarity slightly, but it introduces the inability to see all of the screen at one time without moving the mouse.

It is a completely valid philosophy to reduce usability problems by reducing complexity. After all, a user could easily get him or herself into trouble tweaking the UI. On the other hand, though, this is not a purely cosmetic issue. A user may need to be able to modify the size of text in OS X (and perhaps other aspects of the appearance).

How Information Scatter Informs Content

Not too long ago, I centered a post around a diagram of the web from a network perspective, with components labeled. The point was to take a closer look at the structure of the internet, beyond the simple notion of billions of webpages connected by hyperlinks.

Network metrics like degree, betweenness, etc have also been discussed, but still on the level of webpages.

This level of analysis ignores what the primary function of most webpages – to inform visitors on a given topic. Many company sites intended only to provide marketing information have adopted the strategy of including information on the topic of interest. SEOMoz advises in their SEO for Beginners Guide, “Get out into the forums, blogs, and communities where folks in your industry spend their online discussion time. Note the most frequently asked questions, the most up-to-date topics, and the posts or headlines that generate the most interest. Apply this knowledge when you create high-quality content and directly address your market’s needs.” This works because (in addition to increasing link equity) it expands targeting to visitor’s earlier in the sales process, those still in the research process, but soon to move into the buying stage.

So how does one design the pages on a site that will provide the information, in a way that increases the likelihood of reaching these visitors? Should you provide rare facts, or more general ones? How dense should the facts per page be? It’s easy to find bloggers championing relevant content, but hard to find anyone addressing these questions online.

Research on the distribution of information across the web, called ‘Information Scatter’ by UM researchers helps address them. By creating a biparitite (or two-mode) network, one with two different types of nodes, those representing facts relevant to a given industry or subject area, and those representing the webpages that contain them. The researchers constructed a graph of this type around melonoma facts online:
melanoma info web graph

It has been previously determined that the distribution of facts online is highly skewed, with a few documents containing many facts, and many documents having only a few. Facts can be either general, or rare. This study found that the rare facts are found on the pages with many facts, while general facts tend to appear alone or with only a few other facts.

The common random walk model of online behavior says only that lots of links in to your site will increase the site’s chances of being found. By modeling a potential visitors research process, assuming searching for pages around a fact, starting at one results page and then incrementally expanding the search to another fact encountered on that page, the UM researchers went further than the random walk theory, using the two-mode network to investigate which types of facts and pages are most important in a visitor’s search process.

The results showed the pages most likely to lie on paths between topics, or in network terms, those with high betweenness, to be most important. One of the most interesting findings of this study involves what these pages look like in terms of fact type and distribution.

Lets assume an example around the tire industry. Topics pertinent to those buying tires might include tire care, tire mileage, tire problems, buying tires, etc. The results from this study suggest that a page that included tire facts pertinent to all these categories would be the most important to network traversal. In addition, it was found that a page could address these topics with just one or two facts each without losing its value to an information seeker.

Another option to increase a page’s betweenness, based on this study, is the inclusion on the page of a rare fact that ties two topic areas together. For example, a fact might center around a rarer tire problem, and suggest that the only way to diagnose it properly is by checking your tire air pressure, which pertains to tire care. This would link the tire problems and tire care topics.

The Subjective Web: Online Opinion Mining

At the end of July, Microsoft Research held its 2008 Faculty Summit to survey the state of computing R & D, which this year included a social media summit. A major topic of conversation included the transition of the internet from a network of documents to a network of people.

As participant (host) and Microsoft Scientist Matthew Hurst explains on his blog, “The PageRank era is marked by a very simple link with no explicit meaning and a simple assumption (a positive endorsement).” But this assumption of positive endorsement is becoming unnecessary as more and more direct evidence of people’s opinions and categorizations of content are available online. Research repeatedly reveals that others take notice of human-generated tags and reviews: “consumers report being willing to pay from 20% to 99% more for a 5-star-rated item than a 4-star-rated item (with variance depending on type of item/service)”, is just one example.

Many are excited by how much less processing-intensive the online content tagging process becomes with this trend – clusters of pages and facts seem to grow organically as a result of human tagging. This helps overcome previous problems related to content indexing within info retrieval, such as the gap between the language that the businesses or organizations use to label their content and the terminology preferred by their customers/users.

But there are challenges that arise as well in this transition that are less discussed. Says one scientist, aptly describing the phenomena, “fragmenting media and changing consumer behavior have crippled traditional [media] monitoring methods. Technorati estimates that 75,000 new blogs are created daily, along with 1.2 million new posts each day, many discussing consumer opinions on products and services. Tactics [of the traditional sort] such as clipping services, field agents, and ad hoc research simply can’t keep pace.” Call it what you will: Brand Monitoring, Online Image Tracking, Buzz Monitoring, Online Anthropology, Conversation Mining, Online Consumer Intelligence, Market Influence Analytics … The challenges remain the same. As an example, I think of a project I did here at Pure Visibility last year, which involved analyzing online review content related to a client’s company. After gathering the reviews (in the hundreds), I was faced with the daunting task of mining them for basic information like the overall majority sentiment expressed, and how this correlated with the source. My ultimate method was mostly manual and more than a little tedious.

Hurst’s blog contains a reference to a new book by Pang and Lee that surveys the state of Opinion Mining and Sentiment Analysis, (basically, data-mining and classification using human generated content). In addition to interesting facts on the power of opinions like those above, this book clearly outlines the process that such analysis requires, and the associated challenges. For example, incorporating user opinions into a search engine typically requires the following steps:

  1. determining whether the user is looking for subjective information
  2. accurately classifying docs into the opinionated and non-opinionated bins
  3. identifying overall sentiments expressed and or/specific opinion regarding particular aspects
  4. summarizing information, including aggregating votes via different rating scales, highlighting some opinions, representing disagreement/consensus points, id’ing opinion holders, etc

The challenges are numerous. To summarize some of the excellent points made by Pang and Lee, I sketched out the following table, which compares opinion mining to traditional text mining:

Opinion Mining Fact-based Text Analysis
relatively few classes generalizing over many domains/users often numerous classes (ie topic classification)
represent opposing (binary classification) or ordinal/numerical categories classes can be unrelated
order can overcome frequency (in importance) frequency typically correlates with classification
sentiment typically expressed in subtle manner not isolated to single sentence though dependent on doc length, summarization using single sentence extraction often reasonable
non-trivial task of defining human-preferred keywords accurate classification possible via data-driven only methods

To clarify on this last point, the authors note that this fact alone does not make the task more difficult than traditional topic classification, since data-driven approaches can be applied to the latter to improve accuracy over classification using a human-picked keyword list. The problem is that the accuracy of a data-driven method for opinion analysis is only about 80%, which is still not comparable to the performance expected in traditional topic-based classification.

While these challenges may seem intimidating enough to remain on the horizon for years to come, the fact that this book was written by a Yahoo research scientist, and one of the country’s top CS schools suggests that the right people are thinking about these trends. Significant changes in how we use the web may not be far off.

Will Paid Search Boost John McCain’s Brand?

It’s strange that we’ve started discussing branding, because there’s been a sudden boost in arguments that search results have a brand impact. This article from Search Engine Watch, for example implies that the paid search advertising executed by John McCain, who is currently spending enough to get four times as many paid search impressions as Barack Obama, will have a positive brand effect.

These assertions are a mystery to me, because they seem to intentionally miss the distinction between something that creates brand awareness and something that results in an actual branding effect. The distinction is actually pretty simple:

Brand Awareness is an “I know about” indicator. It is something people know in their heads.

Branding is an “I feel about” indicator. It is something people know in their gut.

So what I want to know is: how exactly is a John McCain for President banner ad going to accomplish the latter?

It can’t. Awareness is not branding. The only effective measure of branding is whether or not a person CHANGES BRANDS. Exposure to new brands is only part of the effort.

It is strange as an analyst to be making this distinction, but I have seen it repeatedly even among my friends. Branding is incredibly powerful, subtle, and pervasive. People who will insist to their last breath that they are unaffected by brand will be heavily influenced by it (to the point of self-parody). Branding–not awareness–is the only real measure of loyalty or purchasing trends.

The reason I am taking this specific stand is because branding is still the consequence of excellent products that have some emotional impact on a user. Search engine results do not provide that. They provide the OPPORTUNITY for that to occur at a fraction of the historical cost, which a key, critical point. But I don’t want to oversell its value. For branding to truly work there still needs to be that traditional marketing and self-identification that is so very hard to replicate or create.

Stolen Analytics Code?

Has stealing other sites’ Google Analytics code snippet become a valid (albeit desperate) strategy to lure visitors to a website?

Last week, in glancing at the ‘Content Drilldown’ report in the Content Reports of the profile for our company website, I noticed not only the two top-level sites I expected to see, but also the names of two domains that I’d never seen before, one of which contained some language I won’t repeat here, for the sake of professionalism.

Was this some intern’s idea of a joke, I wondered? It seemed highly unlikely, if only because both sites were russian (.ru). Several other analysts joined the investigation into the anomaly – it was well-warranted, by virtue of the fact that visits to these domains were likely compromising the quality of our profile’s data, which we use not only in our web strategy but also for demonstrations during the company’s Google Analytics training course.

It was discovered that the less foul-named of the two was a porn site. The other displayed an html page with a design featuring a question mark, as it churned to load what eventually turned into a site that, true to its name, out-porned the other.

We checked the account numbers on the current pages on these sites, only to find that they no longer matched ours. One possible explanation behind a case like this might be a mis-typed account number, by one digit. But the new numbers in the code snippets on these sites were very different than ours.

Is this a desperate strategy to gain visitors, one based on the assumption that the sites will eventually be seen via the ‘Content Drilldown’ report (or another of the Content Reports) by a curious analyst examining the pirated site’s data, one who happens to have vulgar tastes and time to spare?

Perhaps that was not the best segway into advising you to watch your ‘Content Drilldown’ report more closely.

Subscribe to our blog

Never miss another post. Enter your email address and subscribe: