BigQuery联合数据源的API配置 [英] API config for BigQuery Federated Data Source
问题描述
config = {$ b $ 我有以下配置可以很好地将一堆文件加载到BigQuery中。 b'配置'=> {
'load'=> {
'sourceUris'=> 'gs:// my_bucket / my_files_ *',
'schema'=> {
'fields'=> fields_array
},
'schemaUpdateOptions'=> [{'ALLOW_FIELD_ADDITION'=> true}],
'destinationTable'=> {
'projectId'=> 'my_project',
'datasetId'=> 'my_dataset',
'tableId'=> 'my_table'
},
'sourceFormat'=> 'NEWLINE_DELIMITED_JSON',
'createDisposition'=> 'CREATE_IF_NEEDED',
'writeDisposition'=> 'WRITE_TRUNCATE',
'maxBadRecords'=> 0,
}
},
}
然后客户端
被预先初始化: client.execute(
api_method:big_query.jobs.insert,
参数:{projectId:'my_project',datasetId:'my_dataset'},
body_object:config
)
我正在尝试编写相应的代码来创建外部/联合数据源而不是加载数据。我需要这样做才能有效地为ETL目的创建登台表。我已经使用BigQuery UI成功完成了这一工作,但需要在代码中运行,因为它最终将成为每日自动化过程。我在API文档中遇到了一些问题,找不到任何可以引用的示例。谁能帮忙?提前致谢!
对于任何尝试相同的人,以下是我用它来工作。网上没有很多工作实例,文档需要一些解密,所以希望这可以帮助其他人!
config = {
kind:bigquery#table,
tableReference:{
projectId:'my_project',
datasetId:'my_dataset',
tableId:'my_table'
},
externalDataConfiguration:{
autodetect:true,
sourceUris:['gs:// my_bucket / my_files_ *' ],
'sourceFormat'=> 'NEWLINE_DELIMITED_JSON',
'maxBadRecords'=> 10,
}
}
externalDataConfiguration
可以在BigQuery 中找到REST API参考和试试这个API的部分为 bigquery.tables.insert
。
bigquery.tables.insert
而不是 bigquery.jobs.insert
result = client.execute(
api_method:big_query.tables.insert,
参数:{projectId:my_project,datasetId:my_dataset},
body_object:config
)
I have the following config that works fine for loading a bunch of files into BigQuery:
config= {
'configuration'=> {
'load'=> {
'sourceUris'=> 'gs://my_bucket/my_files_*',
'schema'=> {
'fields'=> fields_array
},
'schemaUpdateOptions' => [{ 'ALLOW_FIELD_ADDITION'=> true}],
'destinationTable'=> {
'projectId'=> 'my_project',
'datasetId'=> 'my_dataset',
'tableId'=> 'my_table'
},
'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
'createDisposition' => 'CREATE_IF_NEEDED',
'writeDisposition' => 'WRITE_TRUNCATE',
'maxBadRecords'=> 0,
}
},
}
This is then executed with the following where client
is pre-initialised:
result = client.execute(
api_method: big_query.jobs.insert,
parameters: { projectId: 'my_project', datasetId: 'my_dataset' },
body_object: config
)
I am now trying to write the equivalent to create an external / federated data source instead of loading the data. I need to do this to effectively create staging tables for ETL purposes. I have successfully done this using the BigQuery UI but need to run in code as it will eventually be a daily automated process. I've having a bit of trouble with the API docs and can't find any good examples to refer to. Can anyone help? Thanks in advance!
For anyone attempting the same, here's what I used to get it working. There are not many working examples online and the docs take some deciphering, so hope this helps someone else!
config= {
"kind": "bigquery#table",
"tableReference": {
"projectId": 'my_project',
"datasetId": 'my_dataset',
"tableId": 'my_table'
},
"externalDataConfiguration": {
"autodetect": true,
"sourceUris": ['gs://my_bucket/my_files_*'],
'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
'maxBadRecords'=> 10,
}
}
The documentation for externalDataConfiguration
can be found in the BigQuery REST API reference and "Try this API" section for bigquery.tables.insert
.
Then as pointed out in Hua Zhang's answer you run bigquery.tables.insert
instead of bigquery.jobs.insert
result = client.execute(
api_method: big_query.tables.insert,
parameters: { projectId: my_project, datasetId: my_dataset },
body_object: config
)
这篇关于BigQuery联合数据源的API配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!