BigQuery:从CSV加载,跳过列 [英] BigQuery: Load from CSV, skip columns

查看:173
本文介绍了BigQuery:从CSV加载,跳过列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含现有数据的表,其模式如下:

  {'name':'Field1', 'type':'STRING'},
{'name':'Field2','type':'STRING'}

我们的数据为CSV:

  Field1,Field2 
Value1,Value2
...

我们通过创建新作业来加载数据,直接从Google Cloud中加载CSV存储(GCS)。我们的数据文件现在有一个额外的列和不同的顺序,以便数据现在被组织:

  Field1,Field3,Field2 
Value1,Value3,Value2
...

有没有办法指定在我们想要跳过第二列的加载作业中,只加载第1列和第3列(名为Field1和Field2)?



我正在使用Python API ,service.jobs()。insert(job_body)

基本上我想要做这样的事情:

 job_body = {
'projectId':projectId,
'configuration':{$ b $'load':{$ b $'sourceUris':[sourceCSV ],
'schema':{
'fields':[
{
'name':'Field1',
'type':'STRING'
},
{#这将是跳过的字段
'name':None
'skip':Tr ue
},
{
'name':'Field2',
'type':'字符串'
},
]
},
'destinationTable':{
'projectId':projectId,
'datasetId':datasetId,$ b $'tableId':targetTableId
},
}
}
}

谢谢!

解决方案

目前无法做到这一点,但它可能是一个有趣的功能请求。随意将其添加到 https://code.google.com/p/google -bigquery / issues / list



同时,我会执行2步导入:


  1. 导入为包含3列的新表。

  2. 将SELECT column1,column2 FROM [newtable]附加到现有表中。


Say I have a table with existing data, with a schema like:

{ 'name' : 'Field1', 'type' : 'STRING' },
{ 'name' : 'Field2', 'type' : 'STRING' }

Our data is CSV:

Field1,Field2
Value1,Value2
...

We load data by creating a new job, loading a CSV directly from Google Cloud Storage (GCS). Our data files now have an additional column and different ordering, such that the data is now structured:

Field1,Field3,Field2
Value1,Value3,Value2
...

Is there a way to specify in the load job that we would like to skip the second column, and only load columns 1 and 3 (named Field1 and Field2)?

I am using the Python API e.g., service.jobs().insert(job_body)

Basically I want to do something like this:

job_body = {
  'projectId': projectId,
  'configuration': {
      'load': {
        'sourceUris': [sourceCSV],
        'schema': {
          'fields': [
            {
              'name': 'Field1',
              'type': 'STRING'
            },
            { # this would be the skipped field
              'name': None
              'skip': True
            },
            {
              'name': 'Field2',
              'type': 'String'
            },
          ]
        },
        'destinationTable': {
          'projectId': projectId,
          'datasetId': datasetId,
          'tableId': targetTableId
        },
      }
    }
  }

Thanks!

解决方案

It's not currently possible to do that, but it could be an interesting feature request. Feel free to add it to https://code.google.com/p/google-bigquery/issues/list.

In the meantime, I would do a 2 step import:

  1. Import as a new table with 3 columns.
  2. Append "SELECT column1, column2 FROM [newtable]" into the existing table.

这篇关于BigQuery:从CSV加载,跳过列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆