BigQuery联合数据源的API配置 [英] API config for BigQuery Federated Data Source

查看:178
本文介绍了BigQuery联合数据源的API配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  config = {$ b $ 

我有以下配置可以很好地将一堆文件加载到BigQuery中。 b'配置'=> {
'load'=> {
'sourceUris'=> 'gs:// my_bucket / my_files_ *',
'schema'=> {
'fields'=> fields_array
},
'schemaUpdateOptions'=> [{'ALLOW_FIELD_ADDITION'=> true}],
'destinationTable'=> {
'projectId'=> 'my_project',
'datasetId'=> 'my_dataset',
'tableId'=> 'my_table'
},
'sourceFormat'=> 'NEWLINE_DELIMITED_JSON',
'createDisposition'=> 'CREATE_IF_NEEDED',
'writeDisposition'=> 'WRITE_TRUNCATE',
'maxBadRecords'=> 0,
}
},
}

然后客户端被预先初始化: client.execute(
api_method:big_query.jobs.insert,
参数:{projectId:'my_project',datasetId:'my_dataset'},
body_object:config

我正在尝试编写相应的代码来创建外部/联合数据源而不是加载数据。我需要这样做才能有效地为ETL目的创建登台表。我已经使用BigQuery UI成功完成了这一工作,但需要在代码中运行,因为它最终将成为每日自动化过程。我在API文档中遇到了一些问题,找不到任何可以引用的示例。谁能帮忙?提前致谢!

解决方案

对于任何尝试相同的人,以下是我用它来工作。网上没有很多工作实例,文档需要一些解密,所以希望这可以帮助其他人!

  config = {
kind:bigquery#table,
tableReference:{
projectId:'my_project',
datasetId:'my_dataset',
tableId:'my_table'
},
externalDataConfiguration:{
autodetect:true,
sourceUris:['gs:// my_bucket / my_files_ *' ],
'sourceFormat'=> 'NEWLINE_DELIMITED_JSON',
'maxBadRecords'=> 10,
}
}

externalDataConfiguration 可以在BigQuery 中找到REST API参考试试这个API的部分为 bigquery.tables.insert

然后正如华章的回答你运行 bigquery.tables.insert 而不是 bigquery.jobs.insert

  result = client.execute(
api_method:big_query.tables.insert,
参数:{projectId:my_project,datasetId:my_dataset},
body_object:config


I have the following config that works fine for loading a bunch of files into BigQuery:

config= {
  'configuration'=> {
    'load'=> {
      'sourceUris'=> 'gs://my_bucket/my_files_*',
      'schema'=> {
        'fields'=> fields_array
      },
      'schemaUpdateOptions' => [{ 'ALLOW_FIELD_ADDITION'=> true}],  
      'destinationTable'=> {
        'projectId'=> 'my_project',
        'datasetId'=> 'my_dataset',
        'tableId'=> 'my_table'
      },
      'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
      'createDisposition' => 'CREATE_IF_NEEDED',
      'writeDisposition' => 'WRITE_TRUNCATE',
      'maxBadRecords'=> 0,
    }
  },
}

This is then executed with the following where client is pre-initialised:

result = client.execute(
  api_method: big_query.jobs.insert,
  parameters: { projectId: 'my_project', datasetId: 'my_dataset' },
  body_object: config
)  

I am now trying to write the equivalent to create an external / federated data source instead of loading the data. I need to do this to effectively create staging tables for ETL purposes. I have successfully done this using the BigQuery UI but need to run in code as it will eventually be a daily automated process. I've having a bit of trouble with the API docs and can't find any good examples to refer to. Can anyone help? Thanks in advance!

解决方案

For anyone attempting the same, here's what I used to get it working. There are not many working examples online and the docs take some deciphering, so hope this helps someone else!

config= {
  "kind": "bigquery#table",
  "tableReference": {
    "projectId": 'my_project',
    "datasetId": 'my_dataset',
    "tableId": 'my_table'
  },
  "externalDataConfiguration": {
    "autodetect": true,
    "sourceUris": ['gs://my_bucket/my_files_*'],
    'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
    'maxBadRecords'=> 10,
  }
}  

The documentation for externalDataConfiguration can be found in the BigQuery REST API reference and "Try this API" section for bigquery.tables.insert.

Then as pointed out in Hua Zhang's answer you run bigquery.tables.insert instead of bigquery.jobs.insert

result = client.execute(
    api_method: big_query.tables.insert,
    parameters: { projectId: my_project, datasetId: my_dataset },
    body_object: config
)

这篇关于BigQuery联合数据源的API配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆