如何通过GCS作为json文件将GA360表从Big查询导出到雪花，而不会丢失数据? [英] How to Export GA360 table from Big query to snowflake through GCS as json file without data loss?

查看：86 发布时间：2021/5/11 20:12:46 json google-analytics google-bigquery snowflake-cloud-data-platform data-extraction

本文介绍了如何通过GCS作为json文件将GA360表从Big查询导出到雪花，而不会丢失数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用bq cli命令将GA360表从Big查询导出为雪花，作为json格式.当我将其加载为雪花表时，我丢失了一些字段.我使用copy命令将雪花中的GCS外部阶段的json数据加载到雪花表中.但是，我缺少嵌套数组一部分的某些字段.导出到gcs时，我什至尝试压缩文件，但仍然丢失数据.有人可以建议我该怎么做.我不想在bigquery中弄平表并将其转移.我的每日表格大小最小为1.5GB，最大为4GB.

I am exporting GA360 table from Big query to snowflake as json format using bq cli command. I am losing some fields when I load it as table in snowflake. I use the copy command to load my json data from GCS external stage in snowflake to snowflake tables. But, I am missing some fields that are part of nested array. I even tried compressing the file when I export to gcs but I still loose data. Can someone suggest me how I can do this. I don't want to flatten the table in bigquery and transfer that. My daily table size is minimum of 1.5GB to maximum of 4GB.

bq extract \
  --project_id=myproject \
  --destination_format=NEWLINE_DELIMITED_JSON \
  --compression GZIP \
  datasetid.ga_sessions_20191001 \
gs://test_bucket/ga_sessions_20191001-*.json

我已经在雪花中设置了集成，文件格式和阶段.我将数据从该存储桶复制到具有一个变体字段的表中.行计数与大"查询匹配，但缺少字段.我猜这是由于雪花的限制，每个变体列应为16MB.有什么方法可以将每个变体字段压缩到16MB以下?

I have set up my integration, file format, and stage in snowflake. I copying data from this bucket to a table that has one variant field. The row count matches with Big query but the fields are missing. I am guessing this is due to the limit snowflake has where each variant column should be of 16MB. Is there some way I can compress each variant field to be under 16MB?

推荐答案

我没有问题，可以导出GA360，并将完整的对象导入Snowflake.

I had no problem exporting GA360, and getting the full objects into Snowflake.

首先，我将演示表 bigquery-public-data.google_analytics_sample.ga_sessions_20170801 导出到GCS(JSON格式).

First I exported the demo table bigquery-public-data.google_analytics_sample.ga_sessions_20170801 into GCS, JSON formatted.

然后我将其加载到Snowflake:

Then I loaded it into Snowflake:


create or replace table ga_demo2(src variant);

COPY INTO ga_demo2
FROM 'gcs://[...]/ga_sessions000000000000'
FILE_FORMAT=(TYPE='JSON');

然后找到transactionIds:

And then to find the transactionIds:

SELECT src:visitId, hit.value:transaction.transactionId
FROM ga_demo1, lateral flatten(input => src:hits) hit
WHERE src:visitId='1501621191'
LIMIT 10

需要注意的很酷的事情

我可以轻松地从AWS中部署的Snowflake中读取GCS文件.
Snowflake中的JSON操作真的很棒.

请参见 https://hoffa.medium.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1 .

这篇关于如何通过GCS作为json文件将GA360表从Big查询导出到雪花，而不会丢失数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过GCS作为json文件将GA360表从Big查询导出到雪花，而不会丢失数据? [英] How to Export GA360 table from Big query to snowflake through GCS as json file without data loss?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何通过GCS作为json文件将GA360表从Big查询导出到雪花，而不会丢失数据? [英] How to Export GA360 table from Big query to snowflake through GCS as json file without data loss?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭