Google Analytics - 采样数据呈现的会话数多于API查询 [英] Google Analytics - Sampled Data presents more sessions than API query

查看:85
本文介绍了Google Analytics - 采样数据呈现的会话数多于API查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Core Reporting API V3自动化Google Analytics(分析)报告。



当我请求包含我以前则会发生以下情况:

使用API​​获取的查询报告的会话,用户和综合浏览量等指标高于Google Analytics中显示的指标报告。
我注意到在GA提交的报告中他们提到他们是被抽样的。
这引发了一些疑问,因为我认为抽样效应的指标会低于整个度量指标。

这是怎么回事? (非抽样报告中的度量标准比抽样报告中的标准更高)不太准确:它可能大于或小于真实值。



举例来说,假设我在一家拥有10,000名员工的公司工作。大奶酪希望对他们的员工进行非常详细的调查,以确保每个人都感到高兴,但认为损失10,000小时的工作时间并不恰当。相反,他们随机挑选了1000名工作人员。只要选择是真正随机的,那应该是一个具有代表性的样本,这意味着这个群体的性别平衡,种族,与孩子的比例,平均通勤时间等将大体上与整体劳动力相同。 p>

同样,如果您要求Google Analytics运行需要大量汇总的报告,它可能会决定只查看一半的数据。即使最简单的请求也经常需要大量的计算;从他们的角度来看,随机选择那段时间只有40%或50%的会议要便宜得多,并且将结果放大。

之后他们将结果相乘以补偿,所以你看到的结果将近似等于真实值。最大的变化将出现在不经常发生的事情上;假设你有一个刚花了1000英镑的人的活动,每年可能发生一次。如果这个随机出现在Google的样本中,它可能会认为它每年发生两次。否则,它可能认为它永远不会发生。



如果您面临严重的采样,有几种方法可以避免它。我推荐以下内容:




  • 避免使用用户指标;这是计算最耗时的方法之一。

  • 保持时间短。

  • 避免使用复杂的分段。

  • 尽量不要一次使用太多尺寸。

  • 尽量不要有太多的点击!你有很多多余的事件吗?您是否在多个网站上使用相同的代码?过度使用虚拟页面浏览量?



如果您拥有Google Analytics Premium,则可以请求非抽样报告,但应注意导出的总计给予用户指标;他们仍然把这个问题搞砸了。

无论如何,抽样都可能发生;在极端的情况下,他们可能会把你减少到不到1%的会话。你应该采取一点盐的抽样统计,但也要明白,他们知道他们在做什么。如果你的样本量是50%或更多,那你很好。任何不到40%,你应该开始担心。如果您的收入低于1%,那么您确实将Google Analytics(分析)延伸到了超出其临界点的位置,所以如果它没有尽力帮助您,请不要感到惊讶。


I'm working on automating a Google Analytics report using the Core Reporting API V3.

When I request the data for a query that contains a segment I have previously defined, then the following scenario happens

The metrics such as Sessions, Users and Pageviews that are reported by the query obtained with the API are higher than the ones showed in Google Analytics Reports. I noticed that in the Reports presented by GA they mention that they are sampled. This raises doubts since I would think that the sampling effect would be to have lower metrics than the whole counted metrics.

How does this make any sense? (Metrics in the non sampled report having higher levels than the ones in the sampled report)

解决方案

Sampling just means that the data is less accurate: it is equally likely to be greater or less than the true value.

By way of example, suppose that I work in a company with exactly 10,000 employees. The big cheeses want to perform a very detailed survey of their workforce, to make sure that everybody's happy, but think that losing 10,000 hours of work time just isn't OK. Instead, they randomly select 1,000 staff members. So long as the selection is truly random, that should be a representative sample, meaning that the gender balance, ethnicity, percentage with kids, average commute time etc. of this group will be roughly the same as the workforce as a whole.

Similarly, if you ask Google Analytics to run a report that requires a lot of aggregation, it might decide to look at only half your data. even the simplest requests often require a lot of computation; from their perspective it's much cheaper to randomly select only 40% or 50% of the sessions in that period, and scale the results up.

They multiply the results afterwards to compensate, so the results that you see will be approximately equal to the true value. The biggest variation will come in things that don't happen very often; suppose you had an event for 'someone just spent £1,000' that's likely to take place once a year. If this randomly comes up in Google's sample, it might decide that it happens twice a year. Otherwise, it might think it never happens.

If you're facing heavy sampling, there are several ways to avoid it. I recommend the following:

  • Avoid the Users metric; it's one of the most time consuming to calculate.
  • Keep your time periods short.
  • Avoid using complicated segments.
  • Try not to use too many dimensions at once.
  • Try not to have so many hits! Do you have a ton of superfluous events? Are you using the same code on more than one site? Overusing Virtual Page Views?

If you have Google Analytics Premium, you can request Unsampled reports, although you should watch out for the exported totals given for the Users metric; they still screw this up.

Sampling can happen at any rate; in extreme situations they might cut you down to less than 1% of sessions. You should take any sampled stats with a pinch of salt, but also understand that they know what they're doing. If you're sample size is 50% or more, you're fine. Any less than 40% and you should start to be worried. If you're getting less than about 1% you're really stretching Google Analytics beyond its breaking point, so don't be surprised if it's not doing its best to help you.

这篇关于Google Analytics - 采样数据呈现的会话数多于API查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆