数据流模板云发布/订阅主题vs订阅BigQuery [英] Dataflow Template Cloud Pub/Sub Topic vs Subscription to BigQuery

查看:61
本文介绍了数据流模板云发布/订阅主题vs订阅BigQuery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个简单的概念证明,以学习Google Cloud中的某些概念,特别是PubSub和Dataflow.

I'm setting up a simple Proof of Concept to learn some of the concepts in Google Cloud, specifically PubSub and Dataflow.

我有一个PubSub主题greeting

I have a PubSub topic greeting

我创建了一个简单的云功能,该功能向该主题发送发布消息:

I've created a simple cloud function that sends publishes a message to that topic:

const escapeHtml = require('escape-html');
const { Buffer } = require('safe-buffer');
const { PubSub } = require('@google-cloud/pubsub');

exports.publishGreetingHTTP = async (req, res) => {
    let name = 'no name provided';
    if (req.query && req.query.name) {
        name = escapeHtml(req.query.name);
    } else if (req.body && req.body.name) {
        name = escapeHtml(req.body.name);
    }
    const pubsub = new PubSub();
    const topicName = 'greeting';
    const data = JSON.stringify({ hello: name });
    const dataBuffer = Buffer.from(data);
    const messageId = await pubsub.topic(topicName).publish(dataBuffer);
    res.send(`Message ${messageId} published. name=${name}`);
};

我设置了由主题触发的另一种云功能:

I set up a different cloud function that it triggered by the topic:

const { Buffer } = require('safe-buffer');

exports.subscribeGreetingPubSub = (data) => {
    const pubSubMessage = data;
    const passedData = pubSubMessage.data ? JSON.parse(Buffer.from(pubSubMessage.data, 'base64').toString()) : { error: 'no data' };

    console.log(passedData);
};

这很好用,我看到它已注册为该主题的订阅.

This works great and I see it registered as a subscription on the topic.

现在,我想发送使用Dataflow将数据发送到BigQuery

似乎有2个模板可以完成此任务:

There appear to be 2 template to accomplish this:

  • Cloud Pub/Sub Subscription to BigQuery
  • Cloud Pub/Sub Topic to BigQuery

在这种情况下,我不理解主题"和订阅"之间的区别.

I don't understand the difference between Topic and Subscription in this context.

https://medium. com/google-cloud/new-updates-to-pub-sub-to-bigquery-templates-7844444e6068 亮了一点:

请注意,对主题使用订阅的一个注意事项是订阅只能读取一次,而主题可以读取多次.因此,订阅模板不能支持读取同一订阅的多个并发管道.

Note that a caveat of using subscriptions over topics is that subscriptions are only read once while topics can be read multiple times. Therefore a subscription template cannot support multiple concurrent pipelines reading the same subscription.

但是我必须说我仍然迷失了对这一点的真正含义的理解.

推荐答案

如果使用Topic to BigQuery模板,Dataflow将在后台为您创建一个订阅,该订阅将从指定的主题中读取.如果您使用Subscription to BigQuery模板,则需要提供自己的订阅.

If you use the Topic to BigQuery template, Dataflow will create a subscription behind the scenes for you that reads from the specified topic. If you use the Subscription to BigQuery template, you will need to provide your own subscription.

您可以通过创建从同一主题读取的多个连接到订阅的BigQuery管道,使用订阅到BigQuery模板"来模拟主题到BigQuery"模板的行为.

You can use Subscription to BigQuery templates to emulate the behavior of a Topic to BigQuery template by creating multiple subscription-connected BigQuery pipelines reading from the same topic.

对于新部署,首选使用Subscription to BigQuery模板.如果使用Topic to BigQuery模板停止并重新启动管道,则会创建新的订阅,这可能会导致您丢失一些在管道关闭时发布的消息.订阅BigQuery模板没有此缺点,因为即使重新启动管道后,它也使用相同的订阅.

For new deployments, using the Subscription to BigQuery template is preferred. If you stop and restart a pipeline using the Topic to BigQuery template, a new subscription will be created, which may cause you to miss some messages that were published while the pipeline was down. The Subscription to BigQuery template doesn't have this disadvantage, since it uses the same subscription even after the pipeline is restarted.

这篇关于数据流模板云发布/订阅主题vs订阅BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆