使用R / rga从GA获取完整的未采样数据 [英] Getting full, unsampled data from GA using R / rga

查看:124
本文介绍了使用R / rga从GA获取完整的未采样数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用skardhamar的rga ga $ getData查询GA并以非抽样的方式获取所有数据。这些数据基于每天超过500k次会话。

I'm using skardhamar's rga ga$getData to query GA and get all data in an unsampled manner. The data is based on more than 500k sessions per day.

https://github.com/skardhamar/rga ,段落'提取比10,000更多的观察值'提到这可以通过使用batch = TRUE来实现。此外,段落获取数据未采样提到,通过漫长的日子,您可以获得非抽样数据。我正在尝试将这两者结合起来,但我无法使其发挥作用。例如

At https://github.com/skardhamar/rga, paragraph 'extracting more observations than 10,000' mentions this is possible by using batch = TRUE. Also, paragraph 'Get the data unsampled' mentions that by walking over the days, you can get unsampled data. I'm trying to combine these two, but I can not get it to work. E.g.

ga$getData(xxx,
    start.date = "2015-03-30", 
    end.date = "2015-03-31",
    metrics = "ga:totalEvents", 
    dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
    sort = "", 
    filters = "", 
    segment = "",
    ,batch = TRUE, walk = TRUE
    )

..确实获得非抽样数据,但不是所有数据。我得到一个只有20k行(每天10k)的数据帧。由于使用batch = TRUE设置,这与每天10k大块相反,与我所期望的相反。所以在3月30日,我看到这个输出后得到了一个20k行的数据框:

.. indeed gets unsampled data, but not all data. I get a dataframe with only 20k rows (10k per day). This is limiting to chunks of 10k per day, contrary to what I expect because of using the batch = TRUE setting. So for the 30th of march, I get a dataframe of 20k rows after seeing this output:

Run (1/2): for date 2015-03-30
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations
Run (2/2): for date 2015-03-31
Pulling 10000 observations in batches of 10000
Run (1/1): observations [1;10000]. Batch size: 10000
Received: 10000 observations
Received: 10000 observations

我忽略了walk = TRUE设置,我确实得到了所有的观察结果(771k行,大约每天335k),但只是采样方式:

When I leave out the walk = TRUE setting, I do get all observations (771k rows, around 335k per day), but only in a sampled manner:

ga$getData(xxx,
   start.date = "2015-03-30", 
   end.date = "2015-03-31",
   metrics = "ga:totalEvents", 
   dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
   sort = "", 
   filters = "", 
   segment = "",
   ,batch = TRUE
   )

Notice: Data set contains sampled data
Pulling 771501 observations in batches of 10000
Run (1/78): observations [1;10000]. Batch size: 10000
Notice: Data set contains sampled data
...

我的数据是否太大而无法获取所有观察值?

Is my data just too big to get all observations unsampled?

推荐答案

您可以尝试按设备查询 filters = ga:deviceCategory ==桌面(分别为 filters =ga:deviceCategory!= desktop),然后合并生成的数据框。

You could try querying by device with filters = "ga:deviceCategory==desktop" (and filters = "ga:deviceCategory!=desktop" respectively) and then merging the resulting dataframes.

我假设您的用户使用不同的设备访问您的网站。其基本逻辑是,当您过滤数据时,Google Analytics服务器会在之前将它过滤,以便您可以划分查询并获取非抽样数据。我认为是与走功能相同的方法论。

I'm assuming that your users uses different devices to access your site. The underlying logic is that when you filter data, Google Analytics servers filter it before you get it, so you can "divide" your query and get unsampled data. I think is the same methododology of the "walk" function.

ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory==desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)



手机和平板电脑

Mobile and Tablet

ga$getData(xxx,
start.date = "2015-03-30", 
end.date = "2015-03-31",
metrics = "ga:totalEvents", 
dimensions = "ga:date,ga:customVarValue4,ga:eventCategory,ga:eventAction,ga:eventLabel", 
sort = "", 
filters = "ga:deviceCategory!=desktop", 
segment = "",
,batch = TRUE, walk = TRUE
)

这篇关于使用R / rga从GA获取完整的未采样数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆