解释Apache Beam python语法 [英] Explain Apache Beam python syntax
问题描述
我已经阅读了Beam文档,也阅读了Python文档,但是没有找到关于大多数示例Apache Beam代码中使用的语法的很好的解释.
I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.
有人可以在下面的代码中解释_
,|
和>>
在做什么吗?引号中的文本(即"ReadTrainingData")是否有意义或是否可以与其他任何标签互换?换句话说,该标签是如何使用的?
Can anyone explain what the _
, |
, and >>
are doing in the below code? Also is the text in quotes ie 'ReadTrainingData' meaningful or could it be exchanged with any other label? In other words how is that label being used?
train_data = pipeline | 'ReadTrainingData' >> _ReadData(training_data)
evaluate_data = pipeline | 'ReadEvalData' >> _ReadData(eval_data)
input_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)
_ = (input_metadata
| 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
os.path.join(output_dir, path_constants.RAW_METADATA_DIR),
pipeline=pipeline))
preprocessing_fn = reddit.make_preprocessing_fn(frequency_threshold)
(train_dataset, train_metadata), transform_fn = (
(train_data, input_metadata)
| 'AnalyzeAndTransform' >> tft.AnalyzeAndTransformDataset(
preprocessing_fn))
推荐答案
Python中的运算符可以重载.在Beam中,|
是apply
的同义词,它将PTransform
应用于PCollection
以产生新的PCollection
. >>
允许您命名一个步骤,以方便在各种UI中显示-|
和>>
之间的字符串仅用于这些显示目的并标识该特定应用程序.
Operators in Python can be overloaded. In Beam, |
is a synonym for apply
, which applies a PTransform
to a PCollection
to produce a new PCollection
. >>
allows you to name a step for easier display in various UIs -- the string between the |
and the >>
is only used for these display purposes and identifying that particular application.
请参见 https://beam.apache.org/documentation/programming-guide/#transforms
这篇关于解释Apache Beam python语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!