如何在BQ文件加载方面取得进展 [英] How to get progress on BQ file load
问题描述
在将Big csv(或其他类型)文件导入BigQuery时,我们如何获得导入的进度?例如,如果我们有一个1TB文件并使用import csv命令,那么我不仅需要等待十小时才能导入文件。我们如何获得进展,或者这是不可能的?
https://cloud.google.com/bigquery/loading-data
现在,我们无法获得直到csv文件被加载完毕
关于进度条:
加载任务特定的统计信息永远不会在任务正在进行时返回。统计信息只包含开始/结束时间,Java API将它解析为CopyStatistics类。
{
kind: bigquery#job,
etag:\smpMas70-D1-zV2oEH0ud6qY21c / crKHebm6x2NXA6pCjE8znB7dp-E \,
id:YYY:job_l9TWVQ64YjKx7BgDufu2gReMEL0,
selfLink:https://www.googleapis.com/bigquery/v2/projects/YYY/jobs/job_l9TWVQ64YjKx7BgDufu2gReMEL0,
jobReference:{
projectId:YYY,
jobId:job_l9TWVQ64YjKx7BgDufu2gReMEL0
},
配置:{
load:{
sourceUris:[
gs: // datadocs / afdfb50f-cbc2-47d4-985e-080cadefc963
,
schema:{
fields:[
...
]
},
destinationTable:{
projectId:YYY,
datasetId:1aaf1682dbc2403e92a0a0ed8534581f,
tableId:ORIGIN
},
createDisposition:CREATE_IF_NEEDED,
writeDisposition:WRITE_EMPTY,
fieldDelim iter:,,
skipLeadingRows:1,
quote:\,
maxBadRecords:1000,
allowQuotedNewlines:true ,
sourceFormat:CSV
}
},
status:{
state:RUNNING
},
statistics:{
creationTime:1490868448431,
startTime:1490868449147
},
user_email:YYY @ appspot。 gserviceaccount.com
只有在整个CSV文件已被导入。
我们如何在上传过程中取得进展?
查看 statistics.load.outputBytes
每个文档 - 当一个加载作业在运行状态下,此
值可能会更改
您可以尝试一下 - 如果可以通过工作:获得
When importing a large csv (or other type) file to BigQuery, how can we get the progress of the import? For example, if we have a 1TB file and use the import csv command, I don't just want to wait there ten hours for the file to import. How can we get the progress, or is this not possible?
https://cloud.google.com/bigquery/loading-data
Right now, we're not able to get it until the csv file has been loaded
Regarding progress bar:
Load Task specific statistics is never returned while task is in progress. Statistics only contain start/end time and Java API parses it into CopyStatistics class instead.
{
"kind": "bigquery#job",
"etag": "\"smpMas70-D1-zV2oEH0ud6qY21c/crKHebm6x2NXA6pCjE8znB7dp-E\"",
"id": "YYY:job_l9TWVQ64YjKx7BgDufu2gReMEL0",
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/YYY/jobs/job_l9TWVQ64YjKx7BgDufu2gReMEL0",
"jobReference": {
"projectId": "YYY",
"jobId": "job_l9TWVQ64YjKx7BgDufu2gReMEL0"
},
"configuration": {
"load": {
"sourceUris": [
"gs://datadocs/afdfb50f-cbc2-47d4-985e-080cadefc963"
],
"schema": {
"fields": [
...
]
},
"destinationTable": {
"projectId": "YYY",
"datasetId": "1aaf1682dbc2403e92a0a0ed8534581f",
"tableId": "ORIGIN"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_EMPTY",
"fieldDelimiter": ",",
"skipLeadingRows": 1,
"quote": "\"",
"maxBadRecords": 1000,
"allowQuotedNewlines": true,
"sourceFormat": "CSV"
}
},
"status": {
"state": "RUNNING"
},
"statistics": {
"creationTime": "1490868448431",
"startTime": "1490868449147"
},
"user_email": "YYY@appspot.gserviceaccount.com"
}
Load statistics is only returned in the end, when whole CSV file has been imported.
How do we get the progress while it's being uploaded?
Check out statistics.load.outputBytes
Per documentation - while a load job is in the running state, this value may change
You can experiment with it - if this can be used as progress metric via call to Jobs: get
这篇关于如何在BQ文件加载方面取得进展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!