使用Java将数据从Google Cloud Storage加载到BigQuery [英] Load data from Google Cloud Storage to BigQuery using Java

查看:357
本文介绍了使用Java将数据从Google Cloud Storage加载到BigQuery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数据从Google Cloud Storage上传到BigQuery,但我找不到任何描述如何执行此操作的Java示例代码。有人请给我一些提示,告诉我如何做到这一点?



我真正想做的是将数据从Google App Engine表格传输到BigQuery(并在每日同步基础),以便我可以做一些分析。我在Google App Engine中使用Google Cloud Storage Service将新记录写入Google Cloud Storage中的文件,唯一缺失的部分是将数据追加到BigQuery中的表格中(或者为第一次写入创建一个新表格)。无可否认,我可以使用BigQuery浏览器工具手动上传/追加数据,但我希望它是自动的,否则我需要每天手动进行。

解决方案

我不知道任何用于将表格从Google Cloud Storage加载到BigQuery的Java示例。也就是说,如果您按照说明运行查询作业此处,则可以运行加载作业,而不是以下内容:

  Job job = new Job(); 
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);

job.setConfiguration(config);

//设置您从哪里导入(即Google云端存储路径)。
列表< String> sources = new ArrayList< String>();
sources.add(gs://bucket/csv_to_load.csv);
loadConfig.setSourceUris(sources);

//描述您要导入的表格:
TableReference tableRef = new TableReference();
tableRef.setDatasetId(myDataset);
tableRef.setTableId(myTable);
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);

列表< TableFieldSchema> fields = new ArrayList< TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName(foo);
fieldFoo.setType(string);
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName(bar);
fieldBar.setType(integer);
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);

//设置自定义分隔符或标题行以跳过这里....
// [未显示]。

插入insert = bigquery.jobs()。insert(projectId,job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute()。getJobReference();

// ...查看其余的codelab等待工作完成。

有关加载配置对象的更多信息,请参阅javadoc 此处

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?

What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.

解决方案

I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:

Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);

job.setConfiguration(config);

// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);

// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);

List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);

// Also set custom delimiter or header rows to skip here....
// [not shown].

Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef =  insert.execute().getJobReference();

// ... see rest of codelab for waiting for job to complete.

For more information on the load configuration object, see the javadoc here.

这篇关于使用Java将数据从Google Cloud Storage加载到BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆