使用 Java 将数据从 Google Cloud Storage 加载到 BigQuery [英] Load data from Google Cloud Storage to BigQuery using Java

查看:33
本文介绍了使用 Java 将数据从 Google Cloud Storage 加载到 BigQuery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数据从 Google Cloud Storage 上传到 BigQuery,但找不到任何描述如何执行此操作的 Java 示例代码.有人可以给我一些关于如何做到这一点的提示吗?

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?

我真正想做的是将数据从 Google App Engine 表传输到 BigQuery(并每天同步),以便我可以进行一些分析.我使用 Google App Engine 中的 Google Cloud Storage Service 将(新)记录写入 Google Cloud Storage 中的文件,唯一缺少的部分是将数据附加到 BigQuery 中的表(或首次写入时创建新表).诚然,我可以使用 BigQuery 浏览器工具手动上传/附加数据,但我希望它是自动的,否则我需要每天手动执行.

What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.

推荐答案

我不知道有任何用于将表从 Google Cloud Storage 加载到 BigQuery 的 Java 示例.也就是说,如果您按照此处的说明运行查询作业,则可以运行使用以下内容代替加载作业:

I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:

Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);

job.setConfiguration(config);

// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);

// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);

List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);

// Also set custom delimiter or header rows to skip here....
// [not shown].

Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef =  insert.execute().getJobReference();

// ... see rest of codelab for waiting for job to complete.

有关加载配置对象的更多信息,请参阅 javadoc 此处.

For more information on the load configuration object, see the javadoc here.

这篇关于使用 Java 将数据从 Google Cloud Storage 加载到 BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆