解释Apache Beam python语法 [英] Explain Apache Beam python syntax

查看:197
本文介绍了解释Apache Beam python语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了Beam文档,也阅读了Python文档,但是没有找到关于大多数示例Apache Beam代码中使用的语法的很好的解释.

I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.

有人可以在下面的代码中解释_|>>在做什么吗?引号中的文本(即"ReadTrainingData")是否有意义或是否可以与其他任何标签互换?换句话说,该标签是如何使用的?

Can anyone explain what the _ , | , and >> are doing in the below code? Also is the text in quotes ie 'ReadTrainingData' meaningful or could it be exchanged with any other label? In other words how is that label being used?

train_data = pipeline | 'ReadTrainingData' >> _ReadData(training_data)
evaluate_data = pipeline | 'ReadEvalData' >> _ReadData(eval_data)

input_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)

_ = (input_metadata
| 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
       os.path.join(output_dir, path_constants.RAW_METADATA_DIR),
       pipeline=pipeline))

preprocessing_fn = reddit.make_preprocessing_fn(frequency_threshold)
(train_dataset, train_metadata), transform_fn = (
  (train_data, input_metadata)
  | 'AnalyzeAndTransform' >> tft.AnalyzeAndTransformDataset(
      preprocessing_fn))

推荐答案

Python中的运算符可以重载.在Beam中,|apply的同义词,它将PTransform应用于PCollection以产生新的PCollection. >>允许您命名一个步骤,以方便在各种UI中显示-|>>之间的字符串仅用于这些显示目的并标识该特定应用程序.

Operators in Python can be overloaded. In Beam, | is a synonym for apply, which applies a PTransform to a PCollection to produce a new PCollection. >> allows you to name a step for easier display in various UIs -- the string between the | and the >> is only used for these display purposes and identifying that particular application.

请参见 https://beam.apache.org/documentation/programming-guide/#transforms

这篇关于解释Apache Beam python语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆