让Google BigQuery从csv字符串文件中推断架构 [英] Let Google BigQuery infer schema from csv string file
问题描述
我想将csv数据上传到BigQuery.当数据具有不同的类型(例如字符串和整数)时,它可以用标题来推断列名称,因为标题都是字符串,而其他行则包含整数.
I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.
BigQuery通过将文件的第一行与 数据集中的其他行.如果第一行仅包含字符串, 而其他行则没有,BigQuery假定第一行是 标题行.
BigQuery infers headers by comparing the first row of the file with other rows in the data set. If the first line contains only strings, and the other lines do not, BigQuery assumes that the first row is a header row.
https://cloud.google.com/bigquery/docs/schema-detect
问题是当您的数据都是字符串...
The problem is when your data is all strings ...
您可以指定--skip_leading_rows
,但是BigQuery仍然不使用第一行作为变量名.
You can specify --skip_leading_rows
, but BigQuery still does not use the first row as the name of your variables.
我知道我可以手动指定列名,但是我不愿意这样做,因为我有很多表.还有其他解决方案吗?
I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?
推荐答案
如果您的数据全部为字符串"类型,并且您的CSV文件的第一行包含元数据,那么我想这样做很容易快速脚本,它将解析CSV的第一行并生成类似的创建表格"命令:
If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:
bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable
使用该命令创建一个新的(无效)表,然后将CSV文件加载到该新表中(如您提到的那样使用--skip_leading_rows)
Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)
2018年2月14日:感谢费利佩的评论:
以上注释可以通过以下方式简化:
Above comment can be simplified this way:
bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable
这篇关于让Google BigQuery从csv字符串文件中推断架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!