BigQuery：从CSV加载，跳过列 [英] BigQuery: Load from CSV, skip columns

查看：173 发布时间：2018/5/7 17:42:43 python csv google-bigquery

本文介绍了BigQuery：从CSV加载，跳过列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个包含现有数据的表，其模式如下：

{'name'：'Field1'， 'type'：'STRING'}， {'name'：'Field2'，'type'：'STRING'}
我们的数据为CSV：
Field1，Field2 Value1，Value2 ...
我们通过创建新作业来加载数据，直接从Google Cloud中加载CSV存储（GCS）。我们的数据文件现在有一个额外的列和不同的顺序，以便数据现在被组织：

Field1，Field3，Field2 Value1，Value3，Value2 ...
有没有办法指定在我们想要跳过第二列的加载作业中，只加载第1列和第3列（名为Field1和Field2）？

我正在使用Python API ，service.jobs（）。insert（job_body）

基本上我想要做这样的事情：

job_body = {
'projectId'：projectId，
'configuration'：{$ b $'load'：{$ b $'sourceUris'：[sourceCSV ]，
'schema'：{
'fields'：[
{
'name'：'Field1'，
'type'：'STRING'
}，
{＃这将是跳过的字段
'name'：None
'skip'：Tr ue
}，
{
'name'：'Field2'，
'type'：'字符串'
}，
]
}，
'destinationTable'：{
'projectId'：projectId，
'datasetId'：datasetId，$ b $'tableId'：targetTableId
}，
}
}
}

谢谢！
解决方案
目前无法做到这一点，但它可能是一个有趣的功能请求。随意将其添加到 https://code.google.com/p/google -bigquery / issues / list 。

同时，我会执行2步导入：

导入为包含3列的新表。

将SELECT column1，column2 FROM [newtable]附加到现有表中。

Say I have a table with existing data, with a schema like:
{ 'name' : 'Field1', 'type' : 'STRING' }, { 'name' : 'Field2', 'type' : 'STRING' }
Our data is CSV:
Field1,Field2 Value1,Value2 ...
We load data by creating a new job, loading a CSV directly from Google Cloud Storage (GCS). Our data files now have an additional column and different ordering, such that the data is now structured:
Field1,Field3,Field2 Value1,Value3,Value2 ...
Is there a way to specify in the load job that we would like to skip the second column, and only load columns 1 and 3 (named Field1 and Field2)?

I am using the Python API e.g., service.jobs().insert(job_body)

Basically I want to do something like this:
job_body = { 'projectId': projectId, 'configuration': { 'load': { 'sourceUris': [sourceCSV], 'schema': { 'fields': [ { 'name': 'Field1', 'type': 'STRING' }, { # this would be the skipped field 'name': None 'skip': True }, { 'name': 'Field2', 'type': 'String' }, ] }, 'destinationTable': { 'projectId': projectId, 'datasetId': datasetId, 'tableId': targetTableId }, } } }
Thanks!
解决方案
It's not currently possible to do that, but it could be an interesting feature request. Feel free to add it to https://code.google.com/p/google-bigquery/issues/list.

In the meantime, I would do a 2 step import:

Import as a new table with 3 columns.

Append "SELECT column1, column2 FROM [newtable]" into the existing table.

这篇关于BigQuery：从CSV加载，跳过列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BigQuery：从CSV加载，跳过列 [英] BigQuery: Load from CSV, skip columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BigQuery：从CSV加载，跳过列 [英] BigQuery: Load from CSV, skip columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭