载入BigQuery时使用多个'*'模式不起作用 [英] Using multiple '*' patterns when loading into BigQuery won't work
问题描述
例如,当我们将其加载到BigQuery中时,我们尝试使用全局模式:
We're trying to use a glob pattern when loading into BigQuery, for example:
gs://<bucket_name>/Network*Impressions_12345_20150201*
我们的存储桶中同时包含"..NetworkImpressions_ .."和"..Network 回填 Impressions_ ..",因此我们使用第一个"*"来获取这两种类型的文件.但是BQ讨厌:
We have both "..NetworkImpressions_.." and "..NetworkBackfillImpressions_.." in our bucket, so we use the first '*' to scoop up both types of files. But BQ borks with:
未找到:URI gs://backup-gdfp-7415/Network * Impressions_232503_20150101_20 *"
"Not found: URI gs://backup-gdfp-7415/Network*Impressions_232503_20150101_20*"
文件肯定存在.如果我们删除第一个"*",它将正常工作(并且当我们明确指定两种类型时也是如此).
The files definitely exist. If we remove the first '*' it works fine (and when we explicitly specify both types).
以下是我们尝试使用以下模式的加载作业失败的作业ID:job_LXNGEAeWsaU9HyFgcCCJMHu8YtY
Here's a job id for a failed load job where we are trying to use the pattern: job_LXNGEAeWsaU9HyFgcCCJMHu8YtY
我认为BigQuery应该可以做到吗?
I would have thought this should be possible with BigQuery?
推荐答案
From the documentation for load job configuration sourceUris parameter:
[必需]指向您在Google Cloud Storage中的数据的标准URI. 通配符名称仅当它们出现在URI的末尾时才受支持.
[Required] The fully-qualified URIs that point to your data in Google Cloud Storage. Wildcard names are only supported when they appear at the end of the URI.
这篇关于载入BigQuery时使用多个'*'模式不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!