在Dataflow中自动检测BigQuery模式? [英] Autodetect BigQuery schema within Dataflow?

查看:106
本文介绍了在Dataflow中自动检测BigQuery模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用 - 自动检测 在DataFlow?

即我们可以在没有指定模式的情况下将数据加载到BQ表中,相当于我们如何使用 - autodetect 加载数据



潜在的相关问题

解决方案

如果您使用协议缓冲区作为您PCollections中的对象(应该在Dataflow后端执行得非常好),您可能可以使用我写的util以往。它会在运行时根据原始缓冲区描述符的检查将原始缓冲区的模式解析为BigQuery模式。



我迅速将它上载到 GitHub ,它是在制品,但是你可能能够使用它或者被启发来写类似的东西Java反射(我可能会在某个时候自己做)。

您可以按如下方式使用util:

  TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor()); 
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE ));

创建处置将创建指定模式的表并且ProtobufClass是使用您的Protobuf模式和proto编译器。


Is it possible to use the equivalent of --autodetect in DataFlow?

i.e. can we load data into a BQ table without specifying a schema, equivalent to how we can load data from a CSV with --autodetect?

(potentially related question)

解决方案

If you are using protocol buffers as objects in your PCollections (which should be performing very well on the Dataflow back-end) you might be able to use a util I wrote in the past. It will parse the schema of the protobuffer into a BigQuery schema at runtime, based on inspection of the protobuffer descriptor.

I quickly uploaded it to GitHub, it's WIP, but you might be able to use it or be inspired to write something similar using Java Reflection (I might do it myself at some point).

You can use the util as follows:

TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor());
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

where the create disposition will create the table with the schema specified and the ProtobufClass is the class generated using your Protobuf schema and the proto compiler.

这篇关于在Dataflow中自动检测BigQuery模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆