数据流中的流水线到Bigtable Python [英] Streaming Pipeline in Dataflow to Bigtable Python

查看：15 发布时间：2022/4/6 11:58:03 python-3.x google-cloud-dataflow google-cloud-bigtable

本文介绍了数据流中的流水线到Bigtable Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要读取pubSub主题，并使用用Python编写的数据流代码将数据写入BigTable。我可以在Java中找到样例代码，但在Python中找不到。如何将一行中的列从pubSub分配给不同的列族并将数据写入BigTable？

推荐答案

要在数据流管道中写入大表，您需要创建直接行并将它们传递给WriteToBigTabledoFn。下面是一个简短的示例，它只是传入行键，并为每个键添加一个单元格，这并不是很花哨：

import datetime
import apache_beam as beam

from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigtableio import WriteToBigTable
from google.cloud.bigtable import row


class MyOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_argument(
            '--bigtable-project',
            help='The Bigtable project ID, this can be different than your '
                 'Dataflow project',
            default='bigtable-project')
        parser.add_argument(
            '--bigtable-instance',
            help='The Bigtable instance ID',
            default='bigtable-instance')
        parser.add_argument(
            '--bigtable-table',
            help='The Bigtable table ID in the instance.',
            default='bigtable-table')


class CreateRowFn(beam.DoFn):
    def process(self, key):
        direct_row = row.DirectRow(row_key=key)
        direct_row.set_cell(
            "stats_summary",
            b"os_build",
            b"android",
            datetime.datetime.now())
        return [direct_row]


def run(argv=None):
    """Build and run the pipeline."""
    options = MyOptions(argv)
    with beam.Pipeline(options=options) as p:
        p | beam.Create(["phone#4c410523#20190501",
                         "phone#4c410523#20190502"]) | beam.ParDo(
            CreateRowFn()) | WriteToBigTable(
            project_id=options.bigtable_project,
            instance_id=options.bigtable_instance,
            table_id=options.bigtable_table)


if __name__ == '__main__':
    run()

我现在才刚刚开始探索这一点，一旦完成，我可以在GitHub上链接到更精致的版本。希望这能帮助您入门。

这篇关于数据流中的流水线到Bigtable Python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据流中的流水线到Bigtable Python [英] Streaming Pipeline in Dataflow to Bigtable Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据流中的流水线到Bigtable Python [英] Streaming Pipeline in Dataflow to Bigtable Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭