如何从Cloud Dataflow中的PCollection提取内容? [英] How to extract contents from PCollection in Cloud Dataflow?
问题描述
只想知道如何从PCollection中提取内容? 假设我已经应用了Count.Global,所以在生成的PCollection中只有一个数字,但是如何提取它作为Long值呢?
Just want to know how to extract things from PCollection? Say I have applied a Count.Globally so there's a single number in the resulting PCollection, but how can I extract it as a Long value?
谢谢.
推荐答案
这取决于您要如何使用该值.
It depends on how you want to use that value.
If you want to read that value after your pipeline finishes you could use one of the write transforms (e.g. AvroIO.Write) to write it to some output that you could then read from whatever code executes after your pipeline finishes.
If you want to use that value in a subsequent part of your pipeline then you could apply a View transfrom to generate a PCollectionView which you could then pass as a side input to other transforms.
考虑一个简单的示例,目标是打印出Count.在管道运行之后,计数才可用.因此,在这种情况下,我们可以执行以下操作
Consider a simple example where the goal is to print out the Count. The Count won't be available until after the pipeline runs. So in this case we could do the following
- 定义一个DoFn< Long,String>,我们将其应用于计数,以便将Long转换为我们要打印的消息.
- 应用TextIO.Write转换将消息写入文件.
- 运行该作业,然后等待其完成.如果我们想使用数据流服务执行,则可以使用 BlockingDataflowRunner 等待作业完成.
- 作业完成后,读取创建的文本文件以获取消息并打印出来.
- Define a DoFn<Long, String> which we apply to the count in order to turn the Long into the message we want to print out.
- Apply a TextIO.Write transform to write the message to a file.
- Run the job and wait for it to finish. If we want to execute using the Dataflow Service we can use BlockingDataflowRunner to wait for the job to finish.
- After the job finishes read the text file created to get the message and print it out.
这篇关于如何从Cloud Dataflow中的PCollection提取内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!