Google BigQuery是否支持ARRAY< STRING&gt ;? [英] Does Google BigQuery supports ARRAY<STRING>?

查看:155
本文介绍了Google BigQuery是否支持ARRAY< STRING&gt ;?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Google数据流中的数据推送到Google BigQuery。我有 TableRow 包含数据的对象。 TableRow中的一列包含字符串数组。



这里,我发现Google BigQuery支持Array列类型。
所以我试着用 ARRAY< SCHEMA> 作为类型创建表。但我得到了以下错误:

  com.google.api.client.googleapis.json.GoogleJsonResponseException:400错误请求
{
code:400,
errors:[{
domain:global,
message:无效值:ARRAY< STRING> ;无效值,
reason:无效
}],
消息:无效值:ARRAY< STRING>不是有效值
}
com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:47)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException( DoFnRunnerBase.java:369)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:162)
com.google.cloud.dataflow.sdk.runners.worker。 SimpleParDoFn.finishBundle(SimpleParDoFn.java:194)
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)

以下是我用来发布v的代码(BigQueryIO.Write.named(写入丰富的数据))
.withCreateDisposition(BigQueryIO)(BigQueryIO.Write.named(写入丰富的数据))
.withCreateDisposition(BigQueryIO .Write.CreateDisposition.CREATE_IF_NEEDED)
.withSchema(getSchema())
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.to(table_name));

这里是模式构造

  private static TableSchema getSchema(){
List< TableFieldSchema> fields = new ArrayList<>();

fields.add(new TableFieldSchema()。setName(column1)。setType(STRING));
fields.add(new TableFieldSchema()。setName(column2)。setType(STRING));
fields.add(new TableFieldSchema()。setName(array_column)。setType(ARRAY< STRING>));

返回新的TableSchema()。setFields(fields);
}

如何将字符串数组插入到BigQuery表中?

$在BigQuery中定义 ARRAY< STRING> 我将字段设置为'STRING'它的模式为'REPEATED'。



在Python中,它被定义为 field = SchemaField(name ='field_1',type ='STRING ',mode ='REPEATED')



对于Java客户端,我可以看到您具有相同的选项,您可以定义< a href =http://googlecloudplatform.github.io/google-cloud-java/0.18.0/apidocs/com/google/cloud/bigquery/Field.Type.html =nofollow noreferrer> TYPE < a> as STRING MODE REPEATED


I am pushing the data from Google dataflow to Google BigQuery. I have TableRow object with data in it. One of columns in TableRow does contain Array of String.

From here, I found that Google BigQuery supports Array column type. So I tried to create table with ARRAY<SCHEMA> as type. But I got the below error

com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Invalid value for: ARRAY<STRING> is not a valid value",
    "reason" : "invalid"
  } ],
  "message" : "Invalid value for: ARRAY<STRING> is not a valid value"
}
com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:47)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:369)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:162)
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:194)
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)

Here is the code that I use to publish values into BigQuery

    .apply(BigQueryIO.Write.named("Write enriched data")
               .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
               .withSchema(getSchema())
               .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
               .to("table_name"));

And here is the schema construction

private static TableSchema getSchema() {
    List<TableFieldSchema> fields = new ArrayList<>();

    fields.add(new TableFieldSchema().setName("column1").setType("STRING"));
    fields.add(new TableFieldSchema().setName("column2").setType("STRING"));
    fields.add(new TableFieldSchema().setName("array_column").setType("ARRAY<STRING>"));

    return new TableSchema().setFields(fields);
}

How can I insert array of string into BigQuery table?

解决方案

To define a ARRAY<STRING> in BigQuery I set the field as 'STRING' and its mode as 'REPEATED'.

In Python for instance it's defined as field = SchemaField(name='field_1', type='STRING', mode='REPEATED')

For the Java client for what I could see you have the same options, you could define the TYPE as STRING and the MODE as REPEATED.

这篇关于Google BigQuery是否支持ARRAY&lt; STRING&gt ;?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆