如何从Cloud Dataflow中的PCollection提取内容? [英] How to extract contents from PCollection in Cloud Dataflow?

查看:48
本文介绍了如何从Cloud Dataflow中的PCollection提取内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只想知道如何从PCollection中提取内容? 假设我已经应用了Count.Global,所以在生成的PCollection中只有一个数字,但是如何提取它作为Long值呢?

Just want to know how to extract things from PCollection? Say I have applied a Count.Globally so there's a single number in the resulting PCollection, but how can I extract it as a Long value?

谢谢.

推荐答案

这取决于您要如何使用该值.

It depends on how you want to use that value.

如果您想在管道完成后读取该值,则可以使用一种写入转换(例如

If you want to read that value after your pipeline finishes you could use one of the write transforms (e.g. AvroIO.Write) to write it to some output that you could then read from whatever code executes after your pipeline finishes.

如果您想在管道的后续部分中使用该值,则可以应用

If you want to use that value in a subsequent part of your pipeline then you could apply a View transfrom to generate a PCollectionView which you could then pass as a side input to other transforms.

考虑一个简单的示例,目标是打印出Count.在管道运行之后,计数才可用.因此,在这种情况下,我们可以执行以下操作

Consider a simple example where the goal is to print out the Count. The Count won't be available until after the pipeline runs. So in this case we could do the following

  • Define a DoFn<Long, String> which we apply to the count in order to turn the Long into the message we want to print out.
  • Apply a TextIO.Write transform to write the message to a file.
  • Run the job and wait for it to finish. If we want to execute using the Dataflow Service we can use BlockingDataflowRunner to wait for the job to finish.
  • After the job finishes read the text file created to get the message and print it out.

这篇关于如何从Cloud Dataflow中的PCollection提取内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆