PostHeaderIcon A Guide to Sampling in Google Analytics



Posted by Tom.Capper

Sampling is a process used in statistics when it’s unfeasible or impractical to analyse all the data that exists. Instead, a small, randomly selected subset is used to keep things manageable. Many analytics platforms use some sort of sampling to keep report loading times in check, and there seem to be three schools of thought when it comes to sampling in analytics. There are those who are terrified of it, insisting in unsampled versions of any report. Then there are those who are relaxed about it, trusting the statistical logic. And then, lastly, there are those who are oblivious.

All three are misguided.

Sampling isn’t something to fear, but, in Google Analytics in particular, it can’t always be trusted. Because of that, it’s definitely worth your time to understand when it occurs, how it affects your work, and how it can be avoided.

When it happens

You can always tell when sampling is being used, because of this line at the top of every report:

If the percentage is less than 100%, then sampling is in progress. You’ll notice above that I’ve produced a report based on more than half a billion sessions without any sampling — sampling isn’t just about the sheer number of sessions involved in a report. It’s about the complexity of what you’re asking the platform to report on. Contrast the below (apologies for the small screenshots; I wanted to make sure the whole context was included, so have added captions explaining just what you’re looking at):

No segment applied, report based on 100% of sessions

Segment applied, report based on 0.17% of sessions

The two are identical apart from the use of a segment in the second case. Google Analytics can always provide unsampled data for top-line totals like that first case, but segments in particular are very prone to prompting sampling.

The exact same level of sampling can also be induced through use of a secondary dimension:

Secondary dimension applied, report based on 0.17% of sessions

A few other specialised reports are also prone to this level of sampling, most notably:

  • The Ecommerce Overview
  • “Flow Reports”

Report based on 0.17% of sessions

Report based on <0.1% of sessions

To summarise so far, sampling can happen when we use:

  • A segment
  • More than one dimension
  • Certain detailed reports (including Ecommerce Overview and AdWords Campaigns)
  • “Flow” reports

The accuracy of sampling

Sampling, for the most part, is actually pretty reliable. Take the below two numbers for organic traffic over the same period, one taken from a tiny 0.17% sample, and one taken without sampling:

Report based on 0.17% of sessions, reports 303,384,785 sessions via organic

Report based on 100% of sessions, reports 296,387,352 sessions via organic

The difference is just 2.4%, from a sample of 0.17% of actual sessions. Interestingly, when I repeated this comparison over a shorter period (last quarter), the size of the sample went up to 71.3%, but the margin of error was fairly similar at 2.3%.

It’s worth noting, of course, that the deeper you dig into your data, the smaller the effective sample becomes. If you’re looking at a sample of 1% of data and you notice a landing page with 100 sessions in a report, that’s based on 1 visit — simply because 1 is 1% of 100. For example, take the below:

Report based on 45 sessions

Eight percent of a whole year’s traffic to Distilled is a lot, but 8% of organic traffic to my profile page is not, so we end up viewing a report (above) based on 45 visits. Whether or not this should concern you depends on the size of the changes you’re looking to detect and your threshold for acceptable levels of uncertainty. These topics will be familiar to those with experience in CRO, but I recommend this tool to get your started, and I’ve written about some of the key concepts here.

In extreme cases like the one above, though, your intuition should suffice – that click-through from my /about/ page to /resources/…tup-guide/ claims to feature in 12 sessions, and is based on 8.11% of sessions. As 12 is roughly 8% of 100, we know that this is in fact based on 1 session. Not something you’d want to base a strategy on.

If any of the above concerns you, then I’ve some solutions later in this post. Either way, there’s one more thing you should know about. Check out the below screenshot:

Report based on 100% of sessions, but “All Users” only accounts for 38.81% “of Total”

There’s no sampling here, but the number displayed for “All Users” in fact only contains 38.8% of sessions. This is because of the combination of there being more than 1,000,000 rows (as indicated by the yellow “high-cardinality” warning at the top of the report) and the use of a segment. This is because of the effect of those rows grouped into “(other)”, which are hidden when a segment is active. Regardless of any sampling, the numbers in the rows below will be as accurate as they would be otherwise (apart from the fact that “(other)” is missing), but the segment totals at the top end up of limited use.

So, we’ve now gone over:

  • Sampling is generally pretty accurate (+/- 2.5% in the examples above).
  • When you’re looking at small numbers in reports with a high level of sampling, you can work out how many reports they’re based on.
    • For example, 1% sampling showing 100 sessions means 1 session was the basis of the number in the report.
  • You should keep an eye out for that yellow high-cardinality warning when also using segments.

What you can do about it

Often it’s possible to recreate the key data you want in alternative ways that do not trigger sampling. Mainly this means avoiding segments and secondary dimensions. For example, if we wanted to view the session counts for the top organic landing pages, we might ordinarily use the Landing Pages report and apply a segment:

Landing Pages report with Organic Traffic segment, based on 71.27% of sessions

In the above report, I’ve simply applied a segment to the landing pages report, resulting in sampling. However, I can get the same data unsampled — in the below case, I’ve instead gone to the “Channels” report and clicked on “Organic Search” in the report:

Channels > Organic Search report, with primary dimension “Landing Page”, based on 100% of sessions

This takes me to a report where I’m only looking at organic search sessions, and I can pick a primary dimension of my choice — in this case, Landing Page. It’s worth noting, however, that this trick does not function reliably — when I replicated the same method starting from the “Source / Medium” report, I still ended up with sampling.

A similar trick applies to custom segments — if I wanted to create a segment to show me only visits to certain landing pages, I could instead write a regex advanced filter to replicate the functionality with less chance of sampling:

Lastly, there are a few more extreme solutions. Firstly, you can create duplicate views, then apply view-level filters, to replicate segment functionality (permanently for that view):

Secondly, you can use the API and Google Sheets to break up a report into smaller date ranges, then aggregate them. My colleague Tian Wang wrote about that tool here.

Lastly, there’s GA Premium, which for a not inconsiderable cost, gets you this button:

So lastly, here’s how you can avoid sampling:

  • You can construct reports differently to avoid segments or secondary dimensions and thus reduce the chance of sampling being triggered.
  • You can create duplicate views to show you subsets of your data that you’d otherwise have to view sampled.
  • You can use the GA API to request large numbers of smaller reports then aggregate them in Google Sheets.
  • For larger businesses, there’s always the option of GA Premium to receive unsampled reports.

Discussion

I hope you’ve found this post useful. I’d love to read your thoughts and suggestions in the comments below.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

PostHeaderIcon HTTPS Tops 30%: How Google Is Winning the Long War



Posted by Dr-Pete

[Estimated read time: 6 minutes]

It’s been almost two years (August 2014) since Google announced that HTTPS was a ranking signal. Speculation ran rampant, as usual, with some suggesting there was little or no benefit to switching (and possibly significant risk) while others rushed to sell customers on making the HTTPS switch. Two years later, I believe the data speaks for itself — Google is fighting, and winning, a long war.

What’s happened since?

If you only consider the impact of Google’s original HTTPS update, I understand your skepticism. Prior to the update, our 10,000-keyword tracking system (think of it as a laboratory for studying Google searches) showed that roughly 7% of page-1 Google results used the “https:” protocol. A week after the update announcement, that number had increased only slightly, to just over 8%:

The blue/purple show the before/after based on the announcement date. As you can see, the update probably rolled out over the course of a few days. Even over a 2-week period, though, the impact appears to be fairly small. This led many of us to downplay Google’s statements and ignore HTTPS for a while. The next graph is our wake-up call:

As of late June, our tracking data shows that 32.5% (almost one-third) of page-1 Google results now use the “https:” protocol. The tiny bump on the far left (above “A-14” = August 2014) is the original HTTPS algorithm update. The much larger bump in the middle is when Wikipedia switched to HTTPS. This goes to show the impact that one powerhouse can have on SERPs, but that’s a story for another time.

What does it mean?

Has Google rolled out multiple updates, rewarding HTTPS (or punishing the lack of it)? Probably not. If this two-year trend was purely a result of algorithm updates, we would expect to see a series of jumps and new plateaus. Other than the Wikipedia change and two smaller bumps, the graph clearly shows a gradual progression.

It’s possible that people are simply switching to HTTPS for their own reasons, but I strongly believe that this data suggests Google’s PR campaign is working. They’ve successfully led search marketers and site owners to believe that HTTPS will be rewarded, and this has drastically sped up the shift. An algorithm update is risky and can cause collateral damage. Convincing us that change is for our own good is risk-free for Google. Again, Google is fighting the long war.

Is our data accurate?

Of course, our tracking set is just one sample of search data. The trendline is interesting, but it’s possible that our keywords are overstating the prevalence of HTTPS results. I presented a number of roughly 30% at SMX Advanced in mid-June. Later that same day, Google’s Gary Illyes called me out and confirmed that number:

Gary did not give an exact figure, but essentially gave a nod to the number, suggesting that we’re in the general ballpark. A follow-up tweet confirms this interpretation:

This is as close to confirmation as we can reasonably expect, so let’s assume we showed up to the right ballgame and our tickets aren’t counterfeit.

Why does 30% matter?

Ok, so about one-third of results use HTTPS. Simple arithmetic says that two-thirds don’t. Projecting the trend forward, we’ve got about a year and a half (16–17 months) before HTTPS hits 50%. So, is it time to panic? No, probably not, but here’s the piece of the puzzle you may be missing.

Google has to strike a balance. If they reward sites with HTTPS (or dock sites without it) when very few sites are using it, then they risk a lot of collateral damage to good sites that just haven’t made the switch. If, on the other hand, they wait until most sites have switched, a reward is moot. If 100% of sites are on HTTPS and they reward those sites (or dock the 0% without it), nothing happens. They also have to be careful not to set the reward too high, or sites might switch simply to game the system, but not too low, or no one will care. However I feel about Google on any given day, I acknowledge that their job isn’t easy.

If rewarding HTTPS too heavily when adoption is low is risky and rewarding it when adoption is too high is pointless, then, naturally, the perfect time to strike is somewhere in the middle. At 30% adoption, we’re starting to edge into that middle territory. When adoption hits something like 50–60%, I suspect it will make sense for Google to turn up the algorithmic volume on HTTPS.

At the same time, Google has to make sure that most of the major, trusted sites have switched. As of this writing, 4 of the top 5 sites in our tracking data are running on HTTPS (Wikipedia, Amazon, Facebook, and YouTube) with the only straggler being #5, Yelp. The top 5 sites in our tracking account for just over 12% of page-1 results, which is a big bit of real estate for only 5 sites.

Of the top 20 sites in our tracking data, only 7 have gone full HTTPS. That’s 35%, which is pretty close to our overall numbers across all sites. If Google can convince most of those sites to switch, they’ll have covered quite a bit of ground. Focusing on big players and convincing them to switch puts pressure on smaller sites.

In many ways, Google has already been successful. Even without a major, algorithmic HTTPS boost, sites continue to make the switch. As the number climbs, though, the odds of a larger boost increase. I suspect the war is going to be over sooner than the trendline suggests.

What are the risks?

Am I telling you to make the switch? No. While I think there are good reasons to move to HTTPS for some sites and I think most of Google’s motives are sincere on this subject, I also believe Google has been irresponsible about downplaying the risks.

Any major change to sitewide URLs is risky, especially for large sites. If you weigh the time, money, and risk of the switch against what is still a small algorithmic boost, I think it’s a tough sell in many cases. These risks are not theoretical — back in May, Wired.com wrote up the many problems they’ve encountered during their HTTPS switch, a switch that they’ve since paused to reconsider.

Like any major, sitewide change, you have to consider the broader business case, costs, and benefits. I suspect that pressure from Google will increase, especially as adoption increases, and that we’re within a year of a tipping point where half of page-1 results will be running on HTTPS. Be aware of how the adoption rate is moving in your own industry and be alert, because I suspect we could see another HTTPS algorithm update in the next 6–12 months.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

PostHeaderIcon Do Website Engagement Rates Impact Organic Rankings?



Posted by larry.kim

[Estimated read time: 11 minutes]

Your organic click-through rate is ridiculously important. While it may not be a direct ranking signal that’s even part of Google’s core algorithm, I believe CTR is an indirect signal that definitely impacts rank. And if you improve your click-through rate, you should see your rankings and conversions improve.

zwSzBjT.png

Although having a high organic CTR is crucial, having positive website engagement metrics is even more critical. What value is there in getting hundreds or thousands of people to click on your brilliant headlines if those people don’t stick around for more than a few seconds?

If Google values dwell time, is there a way to see it? YES! Today I’ll share some data that shows the relationship between engagement rates (such as bounce rate and time on site) and rankings.

One important note before we get started: Please don’t focus too much on the absolute bounce rate and time on site figures discussed in this article. We are only looking at figures for one particular vertical. The minimum expected engagement will vary by industry and query type.

Does Google measure dwell time? How is that different from bounce rate & time on site?

Yes. We know Google measures dwell time, or how much time a visitor actually spends on a page before returning to the SERPs.

In 2011, Google announced a new option that allowed us to block domains from appearing in our search results. If you clicked on a result and then returned to the SERP from the website within a few seconds, Google’s blocked sites feature would appear. Clicking it would let you block all results from that site.

BIGq8j8.png

Google told us they would study the data and considered using it as a ranking signal.

Although that feature is no longer with us, we know it was based on whether (and how quickly) you bounced back. So we know Google is definitely measuring dwell time.

The problem is, we don’t have a way to measure dwell time. However, we can measure three engagement metrics that are proportional to and directionally equivalent to dwell time: bounce rate, time on site, and conversion rate.

Does Bounce Rate Impact Organic Position?

OK, let’s get the official Google line out of the way. Google’s Gary Illyes tweeted the following in 2015: “we don’t use analytics/bounce rate in search ranking.” Matt Cutts said similar in the past. Pretty clear, right?

However, I’m not saying that bounce rate is used as a direct ranking factor. And Google definitely doesn’t need Google Analytics to compute dwell time. What I believe is that, in some Rube Goldbergian way, bounce rate does in fact (indirectly) impact rankings.

Does the data back that up? We looked to see if the bounce rate of the pages/keywords we were ranking for had any relationship to their ranking. Check out this graph:

AlKMZrc.jpg

This is very peculiar. Notice the “kink” between positions 4 and 5? In mathematical terms, this is called a “discontinuous function.” What’s happening here?

Well, it seems like for this particular keyword niche, as long as you have a low bounce rate (below 76 percent) then you’re more likely to show up in positions 1 through 4. However, if your bounce rate is higher (above 78 percent), then you’re much less likely to show up in those coveted top 4 positions.

Am I saying bounce rate is part of the core search algorithm Google uses? No.

But I think there’s definitely a relationship between bounce rate and rankings. Looking at that graph, it leads me to believe that it’s no accident — but in fact algorithmic in nature.

My guess is that algorithms use user engagement as a validation method. Think of it more like a “check” on click-through rates within the existing algorithm that hasn’t been quantified.

Undoubtedly, click-through rates can be gamed. For example, I could promise you the digital equivalent of free beer and have a ridiculously high click-through rate.

mTdVQnw.png

Image via Fox.

But if there’s no free beer to be had, most (if not all) of that traffic will bounce right back.

i6kf2UY.png

Image via Fox.

So I believe Google is measuring dwell time (which is proportional to bounce rate) to check whether websites getting high CTRs actually deserve it and if the clicks are indeed valid, or if it’s just click bait.

One other question this discussion obviously raises is: do higher rankings cause higher engagement rates, as opposed to the other way around? Or could both of these be caused by some a completely unrelated factor?

Well, unless you work at Google (and even then!) you may never know all the secrets of Google’s algorithm. There are things we know we don’t know!

Regardless, improving user engagement metrics, like bounce rate, will still have its own benefits. A lower bounce rate is just an indicator of success, not a guarantee of it.

Does time on site impact organic position?

Now let’s look at time on site, another metric we can measure that is proportional to dwell time. This graph also has a “kink” in the curve:

bPX6HyN.jpg

It’s easy to see that if your keyword/content pairs have decent time on site, then you’re more likely to be in top organic positions 1–6. If engagement is weak on average, however, then you’re more likely to be in positions 7 or lower.

Interestingly, you get no additional points after you cross a minimum threshold of time on site. Even if people are spending 2 hours on your site, it doesn’t matter. I think you’ve passed Google’s test — passing it by even more doesn’t result in any additional bonus points.

XsMfz83.png

Image via Fox.

Larry’s Theory: Google uses dwell time — which we can’t measure, but is proportional to user engagement metrics like bounce rate, time on site, and conversion rates — to validate click-through rates. These metrics help Google figure out whether users ultimately got what they were looking for.

Conversion rates: The ultimate metric

So now let’s talk about conversion rates. We know that higher click-through rates typically translate into higher conversion rates:

b1JZh0U.jpg

If you can get people really excited about clicking on something, that excitement typically carries through to a purchase or sign-up.

So what we need is an Engagement Rate Unicorn/Donkey Detector, to detect high and low engagement rates.

BL2Y1jX.jpg

Before we go any further, we need to know: what is a good conversion rate?

hUKUs58.jpg

On average across all industries, site-wide conversion rate for a website is around 2 percent (the donkeys), while conversion rates for the top 10 percent of websites (the unicorns) get 11 percent and above. While absolute conversion rates vary wildly by industry, unicorns always outperform donkeys by 3–5x regardless of industry.

Remember, conversion rates are a very important success metric because you get the most value (you actually captured leads, sold your product, got people to sign up for your newsletter, or visitors did whatever else it was you wanted them to do), which means the user found what they were looking for.

How do you turn conversion rate donkeys into unicorns?

DpOGnS6.png

Image via Fox.

The way you don’t get there is by making little changes. The difference between donkeys and unicorns is so huge. If you want to increase your conversion rates by 3x to 5x, then small, incremental changes of 2 or 3 percent usually won’t cut it.

What should you do?

1. Change your offer (in a BIG way)

Rather than A/B testing button color or image changes, you might be better off trashing your current offer and doing a new one.

Ask yourself: Why in the world are 98 percent of the people who see your offer not taking you up on it? Well, it’s probably because your offer sucks.

212pg36.png

Image via Fox.

What can you offer that will resonate enough that +10 percent of people would be excited about signing up for it or buying it on the spot?

Be open-minded. The answer is probably something adjacent to what you’re currently doing.

For example, for my own company, five years ago our primary offer was to sign up for a trial of our software. It was somewhat complicated, people had to learn how to use the software, and not everyone made it through the process.

Then I had an epiphany: Why don’t I just grade people’s accounts without having them do a trial of our PPC management software, and just give them a report card? That increased my conversion and engagement rates by 10x, and the gains persisted over time. There is much more leverage in changing the offer versus, say, the image on an existing offer.

2. Use Facebook Ads

You can influence users even before they do searches. Brand awareness creates a bias in people’s minds which has a ridiculously huge impact on user engagement signals. We can do this with Facebook Ads.

You want to promote inspirational, compelling, memorable content to your target market. Although they’ll consume your content, they won’t convert to leads and sales right away. Remember, love takes time.

HPBB75w.jpg

Image via Fox.

Rather, your goal is to bias them so in the future they’ll do a search for your product. If it’s an unbranded search, having been exposed to your marketing materials in the past, they’ll be more likely to click on and choose you now.

Facebook and many other vendors have conducted lift studies that prove that Facebook ads impact clicks and conversions you’ll get from paid and organic search.

QKIwx3t.jpg

You won’t get away with promoting junk. You have to promote your unicorns.

For this, we’ll use Facebook’s:

  • Interest-Based Targeting to reach people who are likely to search for the things you’re selling.
  • Demographic Targeting to reach people who are likely to search for the stuff you’re selling, maybe within the next month.
  • Behavioral Targeting to reach the people who buy stuff that is related to the stuff you’re selling.

For example, let’s say you’re a florist or jeweler. You can target Facebook ads at people who will celebrate an anniversary within the next 30 days.

MVVCF24.jpg

Why would you want to do this? Because you know these people will be searching for keywords relating to flowers and jewelry soon. That’s how you can start biasing them to get them to have happy thoughts about your business, increasing the likelihood that they’ll click on you, but more importantly, convert.

It’s not just Facebook. You can also buy image display ads on Google’s Display Network. You can use Custom Affinity Audiences to target people who have searched on keywords you’re interested in, but didn’t click through to your site (or you can specify certain categories related to your business).

3. Remarketing

xt7pjzP.jpg

Image via Fox.

People are busy and have short attention spans. If you aren’t using remarketing, essentially you’re investing a ton of time and money into your SEO and marketing efforts just to get people to visit one time. That’s crazy.

You want to make sure the people who gave you a look to see what your site was about never forget you so that subsequent searches always go your way. You want them to stay engaged and convert.

Remarketing greatly impacts engagement metrics like dwell time, conversion rate, and time on site because people are more familiar with you, which means they’re more likely to be engaged with you for longer.

There’s a reason we spent nearly a million dollars on remarketing last year. Investing in remarketing:

  • Boosted repeat visits by 50 percent.
  • Increased conversions by 51 percent.
  • Grew average time-on-site by 300 percent.

These are huge numbers for a minimal investment (display ads average around $10 for 1,000 views).

It’s your job to convert or squeeze as much money as you can from people who are already in the market for what you sell. So use remarketing to increase brand familiarity and increase user engagement metrics, while simultaneously turning the people who bounced off your site in the past into leads now.

4. Clean up your bad neighborhoods!

If you’ve tried all of the above (and other ways to improve engagement rates) and still have bad neighborhoods on your websites that have low CTR and/or user engagement rates — just delete them. Why?

I believe that terrible engagement metrics will lead to a death spiral where your site gets less clicks, less leads, less sales, and even lower rankings. And who wants that?

Now, I don’t have any proof of this, but the software engineer in me suspects that it would be very difficult for Google to compute engagement rates for every keyword/page combination on the Internet. They would need to lean on a “domain-level engagement score” to fall back on in the event that more granular data wasn’t available. Google does something conceptually similar in AdWords by having both account-level and keyword-level Quality Scores. It’s also similar to how many believe that Google considers links pointing to your domain and also individual pages on your site when computing organic rankings (a moment of silence for our beloved Google PageRank Toolbar). Dumping your very worst neighborhoods — only if all attempts to resuscitate have failed miserably — would, in theory, raise a domain-level score, if it existed.

Obviously better CTRs, higher engagement rates, and improved conversion rates lead to more leads and sales. But I also believe that improvement in these metrics will lead to better organic search rankings, creating a virtuous cycle of even more clicks and conversions.

Conclusion

It’s becoming increasingly clear that organic CTR matters. But you might not realize that high CTRs with low engagement rates aren’t that meaningful.

XqlFpK0.jpg

Image via Fox.

So no cheap tricks, guys! Don’t invest in sites that specialize in gaming your click-through rates. Even though they might work now to an extent, they won’t work well in the future. Google is good at fighting click fraud on ad networks, so you can expect them to apply those same learnings to fight organic search click fraud.

I would prioritize click-through rate and conversion rate (or engagement) optimization at the very top of the most impactful on-page-SEO efforts.

At the very least you’ll get more conversions. But if I’m right, you’ll not only get more conversions, but you’ll get better rankings, which will lead to more conversions and even better rankings.

So use the tactics and strategies from this post to diagnose your engagement rates, and then start optimizing them!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

Free premium templates and themes
Add to Technorati Favorites
Free PageRank Display
Categories
Archives
Our Partners
Related Links
Our Partners
Resources Link Directory Professional Web Design Template