如何使环境变量作为python sdk中的环境变量到达Dataflow Worker [英] How to make the environment variables reach Dataflow workers as environment variables in python sdk
问题描述
我用python sdk编写自定义接收器.我尝试将数据存储到AWS S3.要连接S3,需要一些凭证,秘密密钥,但是出于安全原因,在代码中进行设置不是很好.我想使环境变量作为环境变量到达Dataflow工作人员. 我该怎么办?
I write custom sink with python sdk. I try to store data to AWS S3. To connect S3, some credential, secret key, is necessary, but it's not good to set in code for security reason. I would like to make the environment variables reach Dataflow workers as environment variables. How can I do it?
推荐答案
通常,要将信息传递给您不想进行硬编码的工人,应使用PipelineOptions
-请参阅
Generally, for transmitting information to workers that you don't want to hard-code, you should use PipelineOptions
- please see Creating Custom Options. Then, when constructing the pipeline, just extract the parameters from your PipelineOptions
object and put them into your transform (e.g. into your DoFn
or a sink).
但是,对于诸如凭据之类的敏感内容,在命令行参数中传递敏感信息可能不是一个好主意.我建议一种更安全的方法:将凭据放入GCS上的文件中,并将文件的名称作为PipelineOption
传递.然后,只要需要证书,就可以使用 GcsIO .
However, for something as sensitive as a credential, passing sensitive information in a command-line argument might be not a great idea. I would recommend a more secure approach: put the credential into a file on GCS, and pass the name of the file as a PipelineOption
. Then programmatically read the file from GCS whenever you need the credential, using GcsIO.
这篇关于如何使环境变量作为python sdk中的环境变量到达Dataflow Worker的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!