数据流模板中的动态bigquery查询 [英] Dynamic bigquery query in dataflow template
问题描述
我编写了一个Dataflow作业,当我手动运行它时效果很好.这是相关的部分(为清楚起见,删除了一些验证代码):
I've written a Dataflow job that works great when I run it manually. Here is the relevant section (with some validation code removed for clarity):
parser.add_argument('--end_datetime',
dest='end_datetime')
known_args, pipeline_args = parser.parse_known_args(argv)
query = <redacted SQL String with a placeholder for a date>
query = query.replace('#ENDDATETIME#', known_args.end_datetime)
with beam.Pipeline(options=pipeline_options) as p:
rows = p | 'read query' >> beam.io.Read(beam.io.BigQuerySource(query=query, use_standard_sql=True))
现在,我想创建一个模板并将其安排为定期以动态ENDDATETIME运行.据我了解,为了做到这一点,我需要根据此文档将add_argument更改为add_value_provider_argument:
Now I want to create a template and schedule it to run on a regular basis with a dynamic ENDDATETIME. As I understand it, in order to do this I need to change add_argument to add_value_provider_argument per this documentation:
https://cloud.google.com/dataflow/docs/templates/创建模板
不幸的是,当我需要ValueProvider值时,它们似乎不可用,它们仅在管道内部可用. (如果我在这里错了,请纠正我...).所以我有点卡住了.
Unfortunately, it appears that ValueProvider values are not available when I need them, they're only available inside the pipeline itself. (please correct me if I'm wrong here...). So I'm kind of stuck.
有人在我如何在Dataflow模板的查询中获取动态日期方面有任何指示吗?
Does anyone have any pointers on how I could get a dynamic date into my query in a Dataflow template?
推荐答案
Python当前仅支持FileBasedSource IO的ValueProvider选项.您可以通过在所使用的链接上单击"Python"选项卡来看到它: https://cloud.google.com/dataflow/docs/templates/creating-模板
Python currently only supports ValueProvider options for FileBasedSource IOs. You can see that by clicking on the Python tab at the link you used: https://cloud.google.com/dataflow/docs/templates/creating-templates
管道I/O和运行时参数"部分.
under the "Pipeline I/O and runtime parameters" section.
这篇关于数据流模板中的动态bigquery查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!