将Google Analytics(分析)资料汇入S3或Redshift [英] Ingesting Google Analytics data into S3 or Redshift

查看:452
本文介绍了将Google Analytics(分析)资料汇入S3或Redshift的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找将Google Analytics(分析)数据(以及历史数据)导入Redshift的选项.欢迎提供有关工具,API的任何建议.我在网上搜索,发现Stitch是ETL工具之一,可以帮助我更好地了解此选项和其他选项.

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.

推荐答案

Google Analytics(分析)具有API(核心报告API ).这对于获取偶尔的KPI很有好处,但是由于API的限制,对于导出大量历史数据来说并不是很好.

Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.

对于大数据转储,最好使用Link to BigQuery("Link",因为我想避免使用"integration"一词,这意味着比您实际拥有的控制级别更大.)

For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).

设置指向BigQuery的链接非常容易-您可以在 Google Cloud Console ,启用结算功能(BigQuery需付费,这不是GA360合同的一部分),在"IAM& Admin"部分中将您的电子邮件地址添加为BigQuery Owner,转到您的GA帐户,然后在GA中输入BigQuery项目ID管理员部分,属性设置/产品链接/所有产品/BigQuery链接".该过程的说明如下: https://support.google.com/analytics/answer/3416092

Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092

您可以在标准更新和流式更新之间进行选择-流式更新需要支付额外费用,但可以提供接近实时的数据.前者每天每8小时更新一次BigQuery中的数据3次.

You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.

导出的数据不是原始数据,它已经进行了会话化(即,虽然您每次点击将获得一行数据,但诸如此类匹配的流量归因将基于会话).

The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).

您将支付三种费用:一种用于导出到BigQuery,另一种用于存储,另一种用于实际查询.定价记录在这里: https://cloud.google.com/bigquery/pricing .

You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.

除其他因素外,价格取决于地区.当涉及法律事务时,例如,存储数据的区域可能也很重要.如果您必须遵守GDPR,则您的数据应存储在欧盟中.确保正确地选择区域,因为在区域之间移动数据非常麻烦(您需要将表导出到Google Cloud存储并在适当的区域中重新导入它们)并且价格昂贵.

Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.

您不能只删除数据并进行新的导出-在您的第一个导出中,BigQuery会回填过去13个月的数据,但是每个视图只能这样做一次.因此,如果您需要历史数据更好地解决这个问题,因为如果您删除BQ中的数据,您将无法取回它.

You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.

我实际上对Redshift并不了解很多,但是根据您的评论,您想在Tableau中显示数据,并且Tableau直接连接到BigQuery.

I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.

我们使用自定义SQL查询将数据获取到Tableau(Google Analytics(分析)数据存储在每日表格中,而自定义SQL似乎是查询许多表中数据的最简单方法). BigQuery具有基于用户的缓存,只要查询不发生变化,它就会持续24小时,因此您无需为每次打开报告付费.保持成本仍然是一个好主意-成本不是基于结果的大小,而是基于为产生所需结果而必须搜索的数据量,因此,如果查询时间较长,也许做几次连接,单个查询可能会花费数十欧元(乘以使用该查询的用户数量).

We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).

这篇关于将Google Analytics(分析)资料汇入S3或Redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆