BigQuery:从CSV加载,跳过列 [英] BigQuery: Load from CSV, skip columns
问题描述
{'name':'Field1', 'type':'STRING'},
{'name':'Field2','type':'STRING'}
我们的数据为CSV:
Field1,Field2
Value1,Value2
...
我们通过创建新作业来加载数据,直接从Google Cloud中加载CSV存储(GCS)。我们的数据文件现在有一个额外的列和不同的顺序,以便数据现在被组织:
Field1,Field3,Field2
Value1,Value3,Value2
...
有没有办法指定在我们想要跳过第二列的加载作业中,只加载第1列和第3列(名为Field1和Field2)?
我正在使用Python API ,service.jobs()。insert(job_body)
基本上我想要做这样的事情:
job_body = {
'projectId':projectId,
'configuration':{$ b $'load':{$ b $'sourceUris':[sourceCSV ],
'schema':{
'fields':[
{
'name':'Field1',
'type':'STRING'
},
{#这将是跳过的字段
'name':None
'skip':Tr ue
},
{
'name':'Field2',
'type':'字符串'
},
]
},
'destinationTable':{
'projectId':projectId,
'datasetId':datasetId,$ b $'tableId':targetTableId
},
}
}
}
谢谢!
目前无法做到这一点,但它可能是一个有趣的功能请求。随意将其添加到 https://code.google.com/p/google -bigquery / issues / list 。
同时,我会执行2步导入:
- 导入为包含3列的新表。
- 将SELECT column1,column2 FROM [newtable]附加到现有表中。
Say I have a table with existing data, with a schema like:
{ 'name' : 'Field1', 'type' : 'STRING' },
{ 'name' : 'Field2', 'type' : 'STRING' }
Our data is CSV:
Field1,Field2
Value1,Value2
...
We load data by creating a new job, loading a CSV directly from Google Cloud Storage (GCS). Our data files now have an additional column and different ordering, such that the data is now structured:
Field1,Field3,Field2
Value1,Value3,Value2
...
Is there a way to specify in the load job that we would like to skip the second column, and only load columns 1 and 3 (named Field1 and Field2)?
I am using the Python API e.g., service.jobs().insert(job_body)
Basically I want to do something like this:
job_body = {
'projectId': projectId,
'configuration': {
'load': {
'sourceUris': [sourceCSV],
'schema': {
'fields': [
{
'name': 'Field1',
'type': 'STRING'
},
{ # this would be the skipped field
'name': None
'skip': True
},
{
'name': 'Field2',
'type': 'String'
},
]
},
'destinationTable': {
'projectId': projectId,
'datasetId': datasetId,
'tableId': targetTableId
},
}
}
}
Thanks!
It's not currently possible to do that, but it could be an interesting feature request. Feel free to add it to https://code.google.com/p/google-bigquery/issues/list.
In the meantime, I would do a 2 step import:
- Import as a new table with 3 columns.
- Append "SELECT column1, column2 FROM [newtable]" into the existing table.
这篇关于BigQuery:从CSV加载,跳过列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!