如何将在执行相同的Dataflow管道期间计算出的架构写入BigQuery? [英] How do I write to BigQuery a schema computed during execution of the same Dataflow pipeline?

查看：49 发布时间：2020/11/18 1:29:24 google-cloud-dataflow

本文介绍了如何将在执行相同的Dataflow管道期间计算出的架构写入BigQuery?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的情况是此处讨论的情况的变体: 如何写入BigQuery使用在数据流执行期间计算出的架构?

My scenario is a variation on the one discussed here: How do I write to BigQuery using a schema computed during Dataflow execution?

在这种情况下，目标是相同的(在执行过程中读取模式，然后将具有该模式的表写入BigQuery)，但是我想在一个管道中完成它.

In this case, the goal is that same (read a schema during execution, then write a table with that schema to BigQuery), but I want to accomplish it within a single pipeline.

例如，我想将CSV文件写入BigQuery，并避免两次获取文件(一次读取架构，一次读取数据).

For example, I'd like to write a CSV file to BigQuery and avoid fetching the file twice (once to read schema, once to read data).

这可能吗?如果是这样，最好的方法是什么?

Is this possible? If so, what's the best approach?

我目前的最佳猜测是通过侧面输出将架构读取到PCollection中，然后在将数据传递到BigQueryIO.Write之前，使用该架构创建表(具有自定义PTransform).

My current best guess is to read the schema into a PCollection via a side output and then use that to create the table (with a custom PTransform) before passing the data to BigQueryIO.Write.

如何将在执行相同的Dataflow管道期间计算出的架构写入BigQuery? [英] How do I write to BigQuery a schema computed during execution of the same Dataflow pipeline?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将在执行相同的Dataflow管道期间计算出的架构写入BigQuery? [英] How do I write to BigQuery a schema computed during execution of the same Dataflow pipeline?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭