大查询中的原子插入 [英] atomic inserts in big query

查看:20
本文介绍了大查询中的原子插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我加载超过 1 个 csv 文件时,大查询如何处理错误?

When I load more than 1 csv file, how does big query handles the errors?

bq 加载 --max_bad_record=30 dbname.finalsep20xyzgs://sep20new/abc.csv.gz,gs://sep20new/xyzcsv.gz

bq load --max_bad_record=30 dbname.finalsep20xyz gs://sep20new/abc.csv.gz,gs://sep20new/xyzcsv.gz

批处理作业中有一些文件可能无法加载,因为预期的列数不匹配.我想加载其余的文件.如果文件 abc.csv 失败 xyz.csv 文件会被执行吗?还是整个作业都失败了,不会插入任何记录?

There are a few files in the batch job they may fail to load since the number of expected columns will not match. I want to load the rest of the files though. If the file abc.csv fails Will the xyz.csv file be executed? Or will the entire job fail and no record will be inserted?

我尝试使用虚拟记录,但无法最终找到如何处理多个文件中的错误.

I tried with dummy records but could not conclusively find how the errors in multiple files are handled.

推荐答案

加载是原子的——要么所有文件都提交,要么没有文件提交.如果您希望它们独立完成,您可以将负载分解为多个作业.另一种方法是将 max_bad_records 设置为更高的值.

Loads are atomic -- either all files commit or no files do. You can break the loads up into multiple jobs if you want them to complete independently. An alternative would be to set max_bad_records to something much higher.

我们仍然希望您使用更多文件启动更少的作业,因为我们在处理导入的方式上有更大的灵活性.也就是说,最近对加载配额的更改意味着您可以同时提交更多的加载作业,而且很快还会计划更高的配额.

We would still prefer that you launch fewer jobs with more files, since we have more flexibility in how we handle the imports. That said, recent changes to load quotas mean that you can submit more simultaneous load jobs, and still higher quotas are planned soon.

另请注意,所有修改 BQ 状态(加载、复制、查询目标表)的 BigQuery 操作都是原子的;唯一不是原子的作业类型是提取,因为在写出一些导出的数据后它可能会失败.

Also please note that all BigQuery actions that modify BQ state (load, copy, query with a destination table) are atomic; the only job type that isn't atomic is extract, since there is a chance that it might fail after having written out some of the exported data.

这篇关于大查询中的原子插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆