在大查询中插入原子 [英] atomic inserts in big query

查看:152
本文介绍了在大查询中插入原子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我加载超过1个csv文件时,大查询如何处理这些错误?
$ b


bq load --max_bad_record = 30 dbname .finalsep20xyz
gs://sep20new/abc.csv.gz,gs://sep20new/xyzcsv.gz


批处理作业中有几个文件可能无法加载,因为预期的列数不匹配。我想加载其余的文件。如果文件abc.csv失败请执行xyz.csv文件吗?
或者整个工作会失败,并且没有记录会被插入?



我尝试了虚拟记录,但无法确定如何处理多个文件中的错误。

解决方案

负载是原子的 - 无论是所有文件提交还是没有文件。如果您希望它们独立完成,您可以将这些负载分解为多个作业。另一种方法是将max_bad_records设置为更高的值。



我们仍然倾向于使用更多文件启动更少的作业,因为我们在处理进口。也就是说,最近对加载配额的更改意味着您可以提交更多的同时加载作业,并且即将计划更高的配额。

另外请注意,所有修改后的BigQuery操作BQ状态(加载,复制,查询与目标表)是原子的;唯一不是原子的作业类型是提取的,因为在写出一些导出的数据后,它可能会失败。


When I load more than 1 csv file, how does big query handles the errors?

bq load --max_bad_record=30 dbname.finalsep20xyz gs://sep20new/abc.csv.gz,gs://sep20new/xyzcsv.gz

There are a few files in the batch job they may fail to load since the number of expected columns will not match. I want to load the rest of the files though. If the file abc.csv fails Will the xyz.csv file be executed? Or will the entire job fail and no record will be inserted?

I tried with dummy records but could not conclusively find how the errors in multiple files are handled.

解决方案

Loads are atomic -- either all files commit or no files do. You can break the loads up into multiple jobs if you want them to complete independently. An alternative would be to set max_bad_records to something much higher.

We would still prefer that you launch fewer jobs with more files, since we have more flexibility in how we handle the imports. That said, recent changes to load quotas mean that you can submit more simultaneous load jobs, and still higher quotas are planned soon.

Also please note that all BigQuery actions that modify BQ state (load, copy, query with a destination table) are atomic; the only job type that isn't atomic is extract, since there is a chance that it might fail after having written out some of the exported data.

这篇关于在大查询中插入原子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆