谷歌分析 - 采样数据比 API 查询提供更多的会话 [英] Google Analytics - Sampled Data presents more sessions than API query

查看:17
本文介绍了谷歌分析 - 采样数据比 API 查询提供更多的会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究使用 Core Reporting API V3 自动生成 Google Analytics(分析)报告.

当我为包含我之前定义的段的查询请求数据时,会发生以下情况

使用 API 获取的查询报告的会话数、用户数和浏览量等指标高于 Google Analytics 报告中显示的指标.我注意到在 GA 提交的报告中,他们提到他们被抽样了.这引起了怀疑,因为我认为抽样效果会导致指标低于整个统计指标.

这有什么意义?(非抽样报告中的指标高于抽样报告中的指标)

解决方案

采样只是意味着数据不太准确:它同样可能大于或小于真实值.

举例来说,假设我在一家拥有 10,000 名员工的公司工作.大公司想要对他们的劳动力进行非常详细的调查,以确保每个人都开心,但认为失去 10,000 小时的工作时间是不合适的.相反,他们随机选择了 1,000 名员工.只要选择是真正随机的,那应该是一个具有代表性的样本,这意味着这个群体的性别平衡、种族、带孩子的百分比、平均通勤时间等将与整个劳动力大致相同.

同样,如果您让 Google Analytics(分析)运行需要大量汇总的报告,它可能会决定只查看您的一半数据.即使是最简单的请求也经常需要大量的计算;从他们的角度来看,仅随机选择该时期内 40% 或 50% 的会话并扩大结果的成本要低得多.

他们随后将结果相乘以进行补偿,因此您看到的结果将大约等于真实值.最大的变化来自不经常发生的事情;假设您有一个某人刚刚花了 1,000 英镑"的活动,该活动可能每年举办一次.如果这在 Google 的样本中随机出现,它可能会决定它每年发生两次.否则,它可能会认为它永远不会发生.

如果您面临大量采样,有几种方法可以避免它.我推荐以下内容:

  • 避免使用用户指标;这是最耗时的计算之一.
  • 尽量缩短时间.
  • 避免使用复杂的段.
  • 尽量不要一次使用太多维度.
  • 尽量不要点击太多!你有很多多余的事件吗?您是否在多个站点上使用相同的代码?过度使用虚拟页面浏览量?

如果您有 Google Analytics Premium,您可以请求非抽样报告,但您应该注意为用户指标提供的导出总数;他们还是搞砸了.

采样可以在任何情况下进行;在极端情况下,他们可能会将您减少到少于 1% 的会话.您应该对任何抽样统计数据加一点盐,但也要了解他们知道自己在做什么.如果您的样本量是 50% 或更多,那就没问题了.如果低于 40%,您就应该开始担心了.如果您获得的收益不到 1%,那么您实际上已经超出了 Google Analytics(分析)的极限,因此如果它没有尽最大努力帮助您,请不要感到惊讶.

I'm working on automating a Google Analytics report using the Core Reporting API V3.

When I request the data for a query that contains a segment I have previously defined, then the following scenario happens

The metrics such as Sessions, Users and Pageviews that are reported by the query obtained with the API are higher than the ones showed in Google Analytics Reports. I noticed that in the Reports presented by GA they mention that they are sampled. This raises doubts since I would think that the sampling effect would be to have lower metrics than the whole counted metrics.

How does this make any sense? (Metrics in the non sampled report having higher levels than the ones in the sampled report)

解决方案

Sampling just means that the data is less accurate: it is equally likely to be greater or less than the true value.

By way of example, suppose that I work in a company with exactly 10,000 employees. The big cheeses want to perform a very detailed survey of their workforce, to make sure that everybody's happy, but think that losing 10,000 hours of work time just isn't OK. Instead, they randomly select 1,000 staff members. So long as the selection is truly random, that should be a representative sample, meaning that the gender balance, ethnicity, percentage with kids, average commute time etc. of this group will be roughly the same as the workforce as a whole.

Similarly, if you ask Google Analytics to run a report that requires a lot of aggregation, it might decide to look at only half your data. even the simplest requests often require a lot of computation; from their perspective it's much cheaper to randomly select only 40% or 50% of the sessions in that period, and scale the results up.

They multiply the results afterwards to compensate, so the results that you see will be approximately equal to the true value. The biggest variation will come in things that don't happen very often; suppose you had an event for 'someone just spent £1,000' that's likely to take place once a year. If this randomly comes up in Google's sample, it might decide that it happens twice a year. Otherwise, it might think it never happens.

If you're facing heavy sampling, there are several ways to avoid it. I recommend the following:

  • Avoid the Users metric; it's one of the most time consuming to calculate.
  • Keep your time periods short.
  • Avoid using complicated segments.
  • Try not to use too many dimensions at once.
  • Try not to have so many hits! Do you have a ton of superfluous events? Are you using the same code on more than one site? Overusing Virtual Page Views?

If you have Google Analytics Premium, you can request Unsampled reports, although you should watch out for the exported totals given for the Users metric; they still screw this up.

Sampling can happen at any rate; in extreme situations they might cut you down to less than 1% of sessions. You should take any sampled stats with a pinch of salt, but also understand that they know what they're doing. If you're sample size is 50% or more, you're fine. Any less than 40% and you should start to be worried. If you're getting less than about 1% you're really stretching Google Analytics beyond its breaking point, so don't be surprised if it's not doing its best to help you.

这篇关于谷歌分析 - 采样数据比 API 查询提供更多的会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆