如何通过API在BigQuery中创建不带模式的表? [英] How to create table without schema in BigQuery by API?

查看:289
本文介绍了如何通过API在BigQuery中创建不带模式的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之,我将使用仅提供数据的给定名称创建表.

Simply speaking I would create table with given name providing only data.

我有一些带有示例数据(jsons)的JUnit

I have some JUnit's with sample data (jsons)

必须提供上述文件的架构以为其创建表

I have to provide schema for above files to create tables for them

我想不需要提供上述架构. 为什么?因为在BigQuery控制台中,我可以通过查询创建表(甚至像select 1, 'test'这样的简单表),也可以上传json以使用schema autodetection =>创建表,也许也可以通过编程方式完成

I suppose that don't need provide above schemas. Why? Because in BigQuery console I can create table from query (even such simple like: select 1, 'test') or I can upload json to create table with schema autodetection => probably could also do it programatically

我看到了

I saw https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/#using-the-api and know that could parse jsons with data to queries and use Jobs.insert API to run them but it's over engineered and has some other disadvanteges e.g. boilerplate code.

经过研究,我发现了动态创建表的可能更简单的方法,但是它对我不起作用,代码如下:

After some research I found possibly simpler way of creating table on fly, but it doesn't work for me, code below:

Insert insert = bigquery.jobs().insert(projectId,
                   new Job().setConfiguration(
                            new JobConfiguration().setLoad(
                                   new JobConfigurationLoad()
                                                .setSourceFormat("NEWLINE_DELIMITED_JSON")
                                                .setDestinationTable(
                                                        new TableReference()
                                                                .setProjectId(projectId)
                                                                .setDatasetId(dataSetId)
                                                                .setTableId(tableId)
                                                )
                                                .setCreateDisposition("CREATE_IF_NEEDED")
                                                .setWriteDisposition(writeDisposition)
                                                .setSourceUris(Collections.singletonList(sourceUri))
                                                .setAutodetect(true)
                                )
                        ));

Job myInsertJob = insert.execute();

用作源数据的JSON文件由sourceUri指向,如下所示:

JSON file which is used as a source data is pointed by sourceUri, looks like:

[
  {
    "stringField1": "value1",
    "numberField2": "123456789"
  }
]

即使使用setCreateDisposition("CREATE_IF_NEEDED"),我仍然会收到错误消息:未找到:表..."

Even if I used setCreateDisposition("CREATE_IF_NEEDED") I still receive error: "Not found: Table ..."

API中是否有其他方法或比上面更好的方法来排除架构?

Is there any other method in API or better approach than above to exclude schema?

推荐答案

问题中的代码非常好,如果不存在,它也会创建表.但是,当您使用分区ID代替表ID时,即目标表ID为"table$20170323"(这是您在工作中使用的表)时,此操作将失败.为了写入分区,您必须首先创建表.

The code in your question is perfectly fine, and it does create table if it doesn't exist. However, it fails when you use partition id in place of table id, i.e. when destination table id is "table$20170323" which is what you used in your job. In order to write to partition, you will have to create table first.

这篇关于如何通过API在BigQuery中创建不带模式的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆