将数据从MySQL迁移到BigQuery的最佳实践 [英] Best Practice to migrate data from MySQL to BigQuery

查看:691
本文介绍了将数据从MySQL迁移到BigQuery的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了几种csv格式(不同的转义字符,引号和其他设置)从MySQL导出数据并将其导入BigQuery,但我无法找到适用于所有情况的解决方案。



Google SQL需要以下用于从MySQL导入/导出的代码。虽然Cloud SQL不是BigQuery,但它是一个很好的起点:

  SELECT * INTO OUTFILE'filename.csv'CHARACTER SET 'utf8'
FIELDS TERMINATED BY','可选地由'\''ESCAPED BY''从表

目前我使用以下命令将压缩的csv导入到BigQuery中:

bq --nosync load -F,--null_marker NULL--format = csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json



一方面,bq-command不允许设置转义字符( c>被另一个> >转义) ,这似乎是一个明确定义的 CSV 格式)。另一方面 \作为MySQL导出的转义字符会导致N 为空值,这不起作用:




CSV表格引用colum位置34,但从位置开始的行:0只包含34列。 (错误代码:无效)



所以我的问题是:如何为MySQL写一个(独立于表的)导出命令在SQL中,可以将生成的文件加载到BigQuery中。应该使用哪个转义字符以及如何处理/设置空值?

解决方案

我一直在运行相同的问题,这里是我的解决方案:

从MySQL导出数据



首先,从MySQL导出数据: p>

  SELECT * INTO OUTFILE'filename.csv'字符集'utf8'
字段终止'\t'可选择地封闭BY''
FROM table< yourtable>

这实际上是一个tsv文件(制表符分隔值),但是您可以将它们导入为csv思想。

导入大查询



这样,您应该可以将它导入到大查询中
参数:

  bq load --field_delimiter =\t--null_marker =\N --quote =\ 
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json



注释

$ ol
如果MySQL数据库中的任何字段包含制表符( \ t ),它会破坏你的列。为了防止您在列上添加SQL函数 REPLACE(< column> ;,'\ t',''),它将从标签转换为空格。如果您在大查询的Web界面中设置表格模式,则每次加载CSV时都不需要指定它。

li>

我希望这适用于您。


I tried several csv-formats (different escape characters, quotes and other settings) to export data from MySQL and to import it into BigQuery, but I was not able to find a solution that works in every case.

Google SQL requires the following Code for importing/exporting from/to MySQL. Although, Cloud SQL is not BigQuery, it is a good starting point:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '' FROM table

At the moment I use the following command to import a compressed csv into BigQuery: bq --nosync load -F "," --null_marker "NULL" --format=csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

On one hand the bq-command does not allow to set the escape character (" is escaped by another ", which seems to be a well defined CSV-format). On the other hand \" as escape character for MySQL-export would lead to "N as Null-value, which does not work too:

CSV table references column position 34, but line starting at position:0 contains only 34 columns. (error code: invalid)

So my question is: How to write a (table-independent) export command for MySQL in SQL, such that the generated file can be loaded into BigQuery. Which escape character should be used and how to handle/set null values?

解决方案

I've been running with the same problem, here's my solution:

Exporting data from MySQL

First, export the data from MySQL this way:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '' 
FROM table <yourtable>

This is in reality a tsv file (tab separated values), but you can import them as csv thought.

Import into Big Query

This way you should be able to import it into big query with the following parameters:

bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

Notes

  1. If any field in you MySQL database contains a tab character (\t), it will break your columns. To prevent that you can add the SQL function REPLACE(<column>, '\t', ' ') on the columns and it will convert from tabs to spaces.

  2. If you set the table schema in big query's web interface you won't need to specify it every time you load a CSV.

I hope this works for you.

这篇关于将数据从MySQL迁移到BigQuery的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆