如何在BQ文件加载方面取得进展 [英] How to get progress on BQ file load

查看:130
本文介绍了如何在BQ文件加载方面取得进展的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将Big csv(或其他类型)文件导入BigQuery时,我们如何获得导入的进度?例如,如果我们有一个1TB文件并使用import csv命令,那么我不仅需要等待十小时才能导入文件。我们如何获得进展,或者这是不可能的?



https://cloud.google.com/bigquery/loading-data



现在,我们无法获得直到csv文件被加载完毕




关于进度条:

加载任务特定的统计信息永远不会在任务正在进行时返回。统计信息只包含开始/结束时间,Java API将它解析为CopyStatistics类。

  {
kind: bigquery#job,
etag:\smpMas70-D1-zV2oEH0ud6qY21c / crKHebm6x2NXA6pCjE8znB7dp-E \,
id:YYY:job_l9TWVQ64YjKx7BgDufu2gReMEL0,
selfLink:https://www.googleapis.com/bigquery/v2/projects/YYY/jobs/job_l9TWVQ64YjKx7BgDufu2gReMEL0,
jobReference:{
projectId:YYY,
jobId:job_l9TWVQ64YjKx7BgDufu2gReMEL0
},
配置:{
load:{
sourceUris:[
gs: // datadocs / afdfb50f-cbc2-47d4-985e-080cadefc963

schema:{
fields:[
...
]
},
destinationTable:{
projectId:YYY,
datasetId:1aaf1682dbc2403e92a0a0ed8534581f,
tableId:ORIGIN
},
createDisposition:CREATE_IF_NEEDED,
writeDisposition:WRITE_EMPTY,
fieldDelim iter:,,
skipLeadingRows:1,
quote:\,
maxBadRecords:1000,
allowQuotedNewlines:true ,
sourceFormat:CSV
}
},
status:{
state:RUNNING
},
statistics:{
creationTime:1490868448431,
startTime:1490868449147
},
user_email:YYY @ appspot。 gserviceaccount.com

只有在整个CSV文件已被导入。






我们如何在上传过程中取得进展?

解决方案

查看 statistics.load.outputBytes


每个文档 - 当一个加载作业在运行状态下,此
值可能会更改


您可以尝试一下 - 如果可以通过工作:获得


When importing a large csv (or other type) file to BigQuery, how can we get the progress of the import? For example, if we have a 1TB file and use the import csv command, I don't just want to wait there ten hours for the file to import. How can we get the progress, or is this not possible?

https://cloud.google.com/bigquery/loading-data

Right now, we're not able to get it until the csv file has been loaded


Regarding progress bar:

Load Task specific statistics is never returned while task is in progress. Statistics only contain start/end time and Java API parses it into CopyStatistics class instead.

{
 "kind": "bigquery#job",
 "etag": "\"smpMas70-D1-zV2oEH0ud6qY21c/crKHebm6x2NXA6pCjE8znB7dp-E\"",
 "id": "YYY:job_l9TWVQ64YjKx7BgDufu2gReMEL0",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/YYY/jobs/job_l9TWVQ64YjKx7BgDufu2gReMEL0",
 "jobReference": {
  "projectId": "YYY",
  "jobId": "job_l9TWVQ64YjKx7BgDufu2gReMEL0"
 },
 "configuration": {
  "load": {
   "sourceUris": [
    "gs://datadocs/afdfb50f-cbc2-47d4-985e-080cadefc963"
   ],
   "schema": {
    "fields": [
       ...
    ]
   },
   "destinationTable": {
    "projectId": "YYY",
    "datasetId": "1aaf1682dbc2403e92a0a0ed8534581f",
    "tableId": "ORIGIN"
   },
   "createDisposition": "CREATE_IF_NEEDED",
   "writeDisposition": "WRITE_EMPTY",
   "fieldDelimiter": ",",
   "skipLeadingRows": 1,
   "quote": "\"",
   "maxBadRecords": 1000,
   "allowQuotedNewlines": true,
   "sourceFormat": "CSV"
  }
 },
 "status": {
  "state": "RUNNING"
 },
 "statistics": {
  "creationTime": "1490868448431",
  "startTime": "1490868449147"
 },
 "user_email": "YYY@appspot.gserviceaccount.com"
}

Load statistics is only returned in the end, when whole CSV file has been imported.


How do we get the progress while it's being uploaded?

解决方案

Check out statistics.load.outputBytes

Per documentation - while a load job is in the running state, this value may change

You can experiment with it - if this can be used as progress metric via call to Jobs: get

这篇关于如何在BQ文件加载方面取得进展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆