如何使环境变量作为python sdk中的环境变量到达Dataflow worker [英] How to make the environment variables reach Dataflow workers as environment variables in python sdk

查看:36
本文介绍了如何使环境变量作为python sdk中的环境变量到达Dataflow worker的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 python sdk 编写自定义接收器.我尝试将数据存储到 AWS S3.要连接 S3,需要一些凭证、密钥,但出于安全原因,在代码中设置并不好.我想让环境变量作为环境变量到达 Dataflow 工作人员.我该怎么做?

I write custom sink with python sdk. I try to store data to AWS S3. To connect S3, some credential, secret key, is necessary, but it's not good to set in code for security reason. I would like to make the environment variables reach Dataflow workers as environment variables. How can I do it?

推荐答案

一般来说,要将信息传递给不想硬编码的工作人员,您应该使用 PipelineOptions - 请参阅 创建自定义选项.然后,在构建管道时,只需从您的 PipelineOptions 对象中提取参数并将它们放入您的转换中(例如放入您的 DoFn 或接收器).

Generally, for transmitting information to workers that you don't want to hard-code, you should use PipelineOptions - please see Creating Custom Options. Then, when constructing the pipeline, just extract the parameters from your PipelineOptions object and put them into your transform (e.g. into your DoFn or a sink).

然而,对于像凭据这样敏感的东西,在命令行参数中传递敏感信息可能不是一个好主意.我会推荐一种更安全的方法:将凭证放入 GCS 上的文件中,并将文件的 name 作为 PipelineOption 传递.然后在需要凭据时使用 GcsIO.

However, for something as sensitive as a credential, passing sensitive information in a command-line argument might be not a great idea. I would recommend a more secure approach: put the credential into a file on GCS, and pass the name of the file as a PipelineOption. Then programmatically read the file from GCS whenever you need the credential, using GcsIO.

这篇关于如何使环境变量作为python sdk中的环境变量到达Dataflow worker的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆