如何将 json 从云存储上的文件导入 Bigquery [英] How to import a json from a file on cloud storage to Bigquery

查看：33 发布时间：2021/12/30 23:20:19 python ruby json import google-bigquery

本文介绍了如何将 json 从云存储上的文件导入 Bigquery的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试通过 api 将文件 (json.txt) 从云存储导入 Bigquery 并抛出错误.当这通过 web ui 完成时，它可以工作并且没有错误(我什至设置了 maxBadRecords=0).有人可以告诉我我在这里做错了什么吗?代码有误，还是我需要在 Bigquery 的某处更改某些设置?

I am trying to import a file (json.txt) from cloud storage to Bigquery via the api and have errors thrown. When this is done via the web ui, it works and has no errors (I even set maxBadRecords=0). Could someone please tell me what Im doing incorrectly here? Is the code wrong, or do I need to change some settings in Bigquery somewhere?

该文件是一个纯文本 utf-8 文件，内容如下:我一直在看有关 bigquery 和 json 导入的文档.

The file is a plain text utf-8 file with contents as follows: Ive kept to the docs on bigquery and json imports.

{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}

并在导入作业时引发以下错误:值无法转换为预期类型."对于每一行.

and on importing the job throws the following error: "Value cannot be converted to expected type." for every single line.

    {
    "reason": "invalid",
    "location": "Line:15 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:16 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:17 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
  {
    "reason": "invalid",
    "location": "Line:18 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "message": "Too many errors encountered. Limit is: 10."
   }
  ]
 },
 "statistics": {
  "creationTime": "1384484132723",
  "startTime": "1384484142972",
  "endTime": "1384484182520",
  "load": {
   "inputFiles": "1",
   "inputFileBytes": "960",
   "outputRows": "0",
   "outputBytes": "0"
  }
 }
}

可以在此处访问该文件:http://www.sendspace.com/file/7q0o37

The file can be accessed here: http://www.sendspace.com/file/7q0o37

我的代码和架构如下:

def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
  'configuration'=> {
      'load'=> {
        'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
        'schema'=> {
          'fields'=> [
            { 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
            { 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
            { 'name'=>'object_id',  'type'=>'INTEGER', 'mode'=> 'nullable'}
          ]
        },
        'destinationTable'=> {
          'projectId'=> @project_id.to_s,
          'datasetId'=> dataset,
          'tableId'=> table
        },
        'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
        'createDisposition' => 'CREATE_IF_NEEDED',
        'maxBadRecords'=> 10,
      }
    },
  }

result = @client.execute(
  :api_method=> @bigquery.jobs.insert,
  :parameters=> {
     #'uploadType' => 'resumable',          
      :projectId=> @project_id.to_s,
      :datasetId=> dataset},
  :body_object=> config
)

# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?

puts result.response.body
json = JSON.parse(result.response.body)    
while true
  job_status = get_job_status(json['jobReference']['jobId'])
  if job_status['status']['state'] == 'DONE'
    puts "DONE"
    return true
  else
   puts job_status['status']['state']
   puts job_status 
   sleep 5
  end
end
end

有人能告诉我我做错了什么吗?我要解决什么问题?在哪里解决?

Could someone please tell me what I am doing wrong? What do I fix and where?

此外，在未来的某个时候，我希望使用压缩文件并从中导入-tar.gz"是否可以用于该目的，还是只需要将其设为.gz"?

Also at some point in the future, I expect to be using compressed files and importing from them- is the "tar.gz" ok for that or do I need to make it a ".gz" only?

预先感谢您提供的所有帮助.欣赏.

Thank you in advance for all help. Appreciate it.

如何将 json 从云存储上的文件导入 Bigquery [英] How to import a json from a file on cloud storage to Bigquery

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将 json 从云存储上的文件导入 Bigquery [英] How to import a json from a file on cloud storage to Bigquery

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭