如何从 Google Dataflow 中的 PCollection 中获取元素列表并在管道中使用它来循环写入转换? [英] How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?

查看：21 发布时间：2021/11/11 22:28:00 python google-bigquery google-cloud-dataflow apache-beam

本文介绍了如何从 Google Dataflow 中的 PCollection 中获取元素列表并在管道中使用它来循环写入转换?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将 Google Cloud Dataflow 与 Python SDK 结合使用.

I am using Google Cloud Dataflow with the Python SDK.

我想:

从主 PCollection 中获取唯一日期列表
遍历该列表中的日期以创建过滤后的 PCollection(每个都有一个唯一的日期)，并将每个过滤后的 PCollection 写入 BigQuery 中时间分区表中的分区.

我怎样才能得到那个列表?在以下组合转换之后，我创建了一个 ListPCollectionView 对象，但我无法迭代该对象:

How can I get that list ? After the following combine transform, I created a ListPCollectionView object but I cannot iterate that object :

class ToUniqueList(beam.CombineFn):

    def create_accumulator(self):
        return []

    def add_input(self, accumulator, element):
        if element not in accumulator:
            accumulator.append(element)
        return accumulator

    def merge_accumulators(self, accumulators):
        return list(set(accumulators))

    def extract_output(self, accumulator):
        return accumulator


def get_list_of_dates(pcoll):

    return (pcoll
            | 'get the list of dates' >> beam.CombineGlobally(ToUniqueList()))

我做错了吗?最好的方法是什么?

Am I doing it all wrong ? What is the best way to do that ?

谢谢.

如何从 Google Dataflow 中的 PCollection 中获取元素列表并在管道中使用它来循环写入转换? [英] How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从 Google Dataflow 中的 PCollection 中获取元素列表并在管道中使用它来循环写入转换? [英] How to get a list of elements out of a PCollection in Google Dataflow and use it in the pipeline to loop Write Transforms?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭