需要帮助来创建用于将CSV加载到BigQuery的架构 [英] Need help creating schema for loading CSV into BigQuery

查看:79
本文介绍了需要帮助来创建用于将CSV加载到BigQuery的架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一些CSV文件从Google Cloud Storage加载到BigQuery中,并尝试进行模式生成.有一个自动生成的选项,但是文档记录很少.问题是,如果我选择让BigQuery生成模式,它将在猜测数据类型方面做得不错,但是只有在某些情况下,它才能将数据的第一行识别为标题行,而有时却不能(将第一行作为数据,并生成列名称(如string_field_N).我数据的第一行总是 标头行.有些表有很多列(超过30列),我不想弄乱架构语法,因为当架构出现问题(我不知道是什么)时,BigQuery总是会轰炸,并给出无意义的错误消息.

I am trying to load some CSV files into BigQuery from Google Cloud Storage and wrestling with schema generation. There is an auto-generate option but it is poorly documented. The problem is that if I choose to let BigQuery generate the schema, it does a decent job of guessing data types, but only sometimes does it recognizes the first row of the data as a header row, and sometimes it does not (treats the 1st row as data and generates column names like string_field_N). The first rows of my data are always header rows. Some of the tables have many columns (over 30), and I do not want to mess around with schema syntax because BigQuery always bombs with an uninformative error message when something (I have no idea what) is wrong with the schema.

因此:如何强制其将第一行识别为标题行?如果这不可能,那么如何获取它以正确的语法吐出生成的架构,以便我可以对其进行编辑(针对适当的列名)并将其用作导入时的架构?

So: How can I force it to recognize the first row as a header row? If that isn't possible, how do I get it to spit out the schema it generated in the proper syntax so that I can edit it (for appropriate column names) and use that as the schema on import?

推荐答案

在大多数情况下,BigQuery中的模式自动检测应该能够将CSV文件的第一行作为列名进行检测.列名称检测失败的情况之一是,当您在整个CSV文件中都具有相似的数据类型时.例如,由于每个字段都是一个字符串,BigQuery架构自动检测将无法检测以下文件的标头名称.

Schema auto detection in BigQuery should be able to detect the first row of your CSV file as column names in most cases. One of the cases for which column name detection fails is when you have similar data types all over your CSV file. For instance, BigQuery schema auto detect would not be able to detect header names for the following file since every field is a String.

headerA, headerB
row1a, row1b
row2a, row2b
row3a, row3b

UI中的要跳过的标题行"选项将无助于解决BigQuery中模式自动检测的这一缺陷.

The "Header rows to skip" option in the UI would not help fixing this shortcoming of schema auto detection in BigQuery.

这篇关于需要帮助来创建用于将CSV加载到BigQuery的架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆