使用Apache Beam python创建Google云数据流模板时出现RuntimeValueProviderError [英] RuntimeValueProviderError when creating a google cloud dataflow template with Apache Beam python
问题描述
我无法使用python 3.7登台云数据流模板.使用apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible
I can't stage a cloud dataflow template with python 3.7. It fails on the one parametrized argument with apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible
使用python 2.7登台模板可以正常工作.
Staging the template with python 2.7 works fine.
我尝试用3.7运行数据流作业,它们工作正常.仅模板暂存已损坏. 数据流模板中仍不支持python 3.7还是python 3中的暂存语法发生了变化?
I have tried running dataflow jobs with 3.7 and they work fine. Only the template staging is broken. Is python 3.7 still not supported in dataflow templates or did the syntax for staging in python 3 change?
这是流水线
class WordcountOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from',
dest="input")
def main(argv=None):
options = PipelineOptions(flags=argv)
setup_options = options.view_as(SetupOptions)
wordcount_options = options.view_as(WordcountOptions)
with beam.Pipeline(options=setup_options) as p:
lines = p | 'read' >> ReadFromText(wordcount_options.input)
if __name__ == '__main__':
main()
这是带有暂存脚本的完整回购 https://github.com/firemuzzy/dataflow-templates-bug-python3
Here is the full repo with the staging scripts https://github.com/firemuzzy/dataflow-templates-bug-python3
以前也有类似的问题,但是不确定它的相关性,因为那是在python 2.7中完成的,但是我的模板在2.7中很好,但在3.7中失败了
There was a previous similar issues, but am not sure how related it is since that was done in python 2.7 but my template stages fine in 2.7 but fails in 3.7
如何创建Google Cloud Dataflow Wordcount python中的自定义模板?
****堆栈跟踪****
**** Stack Trace ****
Traceback (most recent call last):
File "run_pipeline.py", line 44, in <module>
main()
File "run_pipeline.py", line 41, in main
lines = p | 'read' >> ReadFromText(wordcount_options.input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 906, in __ror__
return self.transform.__ror__(pvalueish, self.label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
result = p.apply(self, pvalueish, label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 490, in apply
return self.apply(transform, pvalueish)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 542, in expand
return pvalue.pipeline | Read(self._source)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
result = p.apply(self, pvalueish, label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1020, in apply_Read
return self.apply_PTransform(transform, pbegin, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 863, in expand
return pbegin | _SDFBoundedSourceWrapper(self.source)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pvalue.py", line 113, in __or__
return self.pipeline.apply(ptransform, self)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1543, in expand
| core.ParDo(self._create_sdf_bounded_source_dofn()))
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1517, in _create_sdf_bounded_source_dofn
estimated_size = source.estimate_size()
File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 136, in _f
raise error.RuntimeValueProviderError('%s not accessible' % obj)
apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible
推荐答案
不幸的是,看起来模板在Apache Beam的Python SDK 2.18.0上已损坏.
Unfortunately, it looks like templates are broken on Apache Beam's Python SDK 2.18.0.
目前,解决方案是避免使用Beam 2.18.0,因此在您的需求/依赖项中,定义apache-beam[gcp]<2.18.0
或apache-beam[gcp]>2.18.0
For now, the solution to this is to avoid Beam 2.18.0, so in your requirements / dependencies, define apache-beam[gcp]<2.18.0
or apache-beam[gcp]>2.18.0
这篇关于使用Apache Beam python创建Google云数据流模板时出现RuntimeValueProviderError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!