使用Google BigQuery客户端API在BigQuery中加载JSON文件 [英] Loading JSON file in BigQuery using Google BigQuery Client API

查看:491
本文介绍了使用Google BigQuery客户端API在BigQuery中加载JSON文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以使用Google BigQuery客户端API从本地文件系统加载JQuery文件到BigQuery?

我找到的所有选项都是:



1-逐一记录数据。

2-从GCS载入JSON数据。



3-使用原始POST请求加载JSON(即不通过Google Client API)。 我从python标签假设你想从python做到这一点。 此处有一个加载示例,从本地文件加载数据(它使用CSV,但它很容易适应JSON ...在同一目录中有另一个json示例)。



基本流程是:

 #加载指定目的地的配置。 
load_config = {
'destinationTable':{
'projectId':PROJECT_ID,$ b $'datasetId':DATASET_ID,
'tableId':TABLE_ID
}

$ b $ load_config ['schema'] = {
'fields':[
{'name':'string_f','type':'STRING' },
{'name':'boolean_f','type':'BOOLEAN'},
{'name':'integer_f','type':'INTEGER'},
{'name':'float_f','type':'FLOAT'},
{'name':'timestamp_f','type':'TIMESTAMP'}
]
}
load_config ['sourceFormat'] ='NEWLINE_DELIMITED_JSON'

#这告诉它执行一个本地文件的可恢复上传
#叫做'foo.json'
upload = MediaFileUpload('foo.json',
mimetype ='application / octet-stream',
#这将启用可恢复的上传功能
resumable = True)

start = ti me.time()
job_id ='job_%d'%start
#创建作业。
result = jobs.insert(
projectId = project_id,
body = {
'jobReference':{
'jobId':job_id
},
'configuration':{
'load':load
}
},
media_body = upload).execute()

#然后你还想等待结果并检查状态。 (查看
#链接中的示例以获取更多信息)。


Is there a way to load a JSON file from local file system to BigQuery using Google BigQuery Client API?

All the options I found are:

1- Streaming the records one by one.

2- Loading JSON data from GCS.

3- Using raw POST requests to load the JSON (i.e. not through Google Client API).

解决方案

I'm assuming from the python tag that you want to do this from python. There is a load example here that loads data from a local file (it uses CSV, but it is easy to adapt it to JSON... there is another json example in the same directory).

The basic flow is:

# Load configuration with the destination specified.
load_config = {
  'destinationTable': {
    'projectId': PROJECT_ID,
    'datasetId': DATASET_ID,
    'tableId': TABLE_ID
  }
}

load_config['schema'] = {
  'fields': [
    {'name':'string_f', 'type':'STRING'},
    {'name':'boolean_f', 'type':'BOOLEAN'},
    {'name':'integer_f', 'type':'INTEGER'},
    {'name':'float_f', 'type':'FLOAT'},
    {'name':'timestamp_f', 'type':'TIMESTAMP'}
  ]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'

# This tells it to perform a resumable upload of a local file
# called 'foo.json' 
upload = MediaFileUpload('foo.json',
                         mimetype='application/octet-stream',
                         # This enables resumable uploads.
                         resumable=True)

start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(
  projectId=project_id,
  body={
    'jobReference': {
      'jobId': job_id
    },
    'configuration': {
      'load': load
    }
  },
  media_body=upload).execute()

 # Then you'd also want to wait for the result and check the status. (check out
 # the example at the link for more info).

这篇关于使用Google BigQuery客户端API在BigQuery中加载JSON文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆