BigQuery加载-控制字符作为分隔符 [英] BigQuery load - control character as delimiter

查看:41
本文介绍了BigQuery加载-控制字符作为分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有要加载的文件,其中字段值由单位分隔符" 0x1f
分隔根据 doc 无法打印,应使用UTF-8编码.

We have files to load where field values are separated by the "unit separator", 0x1f
As per the doc, if not printable, it should be encoded in UTF-8.

使用 bq CLI,我尝试将带有 U + 001F -F 参数传递给无效: BigQuery错误在加载操作中:字段分隔符必须为单个字符,找到:"U + 001F" .
使用 0x1F 或`\ x1f(带引号或不带引号)都没有运气.

Using the bq CLI, I tried passing the -F argument with U+001F to no avail though:BigQuery error in load operation: Field delimiter must be a single character, found:"U+001F".
No luck either with 0x1F or `\x1f, with or without quotes.

我编码是否错误,或者是 bq 还是API中的错误?

Have I the encoding wrong or is it a bug in bq, or the API ?

编辑:
在与资源管理器一起玩后,发现它是不喜欢定界符的API.除了可打印的定界符,您还可以使用 \ t ,但显然也可以使用未记录的 \ b (退格)和 \ f (表单字段).br> tab 可能是自由格式文本字段中用户输入的有效字符,因此我们需要使用控制字符(从'unit sep'转换后)

EDIT:
Turns out after playing with the explorer that it's the API that doesn't like the delimiter. Besides the printable delimiters, you can use \t but also the undocumented \b (backspace) and \f (form field) apparently.
tab could be a valid user-entered character in a free-form text field so we need to use a control char (after conversion from 'unit sep')

:
请注意,作为分隔符的 \ f 确实可以直接通过API正常运行,但 bq CLI却不能,(字段分隔符必须为单个字符,请找到:"\ f").

:
Note that \f as delimiter does work fine through the API directly but not the bq CLI (Field delimiter must be a single character, found:"\f").

推荐答案

实际上,由GCP支持,这在Linux上有效:

Actually, courtesy of GCP support, this works on Linux:

bq load --autodetect --field_delimiter=$(printf '\x1f') [DATASET].[TABLE] gs://[BUCKET]/simple.csv

在Windows上,在命令行上返回/生成控制字符并不是那么简单.如果使用PowerShell,则更容易.

On Windows, it's not that straightforward to return/generate a control character on the command-line. Easier if you use PowerShell.

我同意 @Felipe ,目前这是 bq CLI 工具的局限性,但是我可以很容易地将其固定在源代码中参数上的 .decode('utf-8')以字节为单位,这样

I agree with @Felipe, this is currently a limitation in the bq CLI tool, but one that can easily be fixed in the source code in my mind with a .decode('utf-8') on the argument in bytes, so that

 --field_delimiter=\x1f 

可以在任何平台上按原样工作.

can work as-is on any platform.

希望 bq CLI团队能够考虑关闭该功能.

Closing with the hope the bq CLI team will consider the enhancement.

这篇关于BigQuery加载-控制字符作为分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆