解释 Apache Beam python 语法 [英] Explain Apache Beam python syntax

查看:28
本文介绍了解释 Apache Beam python 语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通读了 Beam 文档并查看了 Python 文档,但没有找到对大多数示例 Apache Beam 代码中使用的语法的很好解释.

I have read through the Beam documentation and also looked through Python documentation but haven't found a good explanation of the syntax being used in most of the example Apache Beam code.

谁能解释一下 _|>> 在下面的代码中做了什么?引号中的文本也就是ReadTrainingData"有意义还是可以与任何其他标签交换?换句话说,该标签是如何使用的?

Can anyone explain what the _ , | , and >> are doing in the below code? Also is the text in quotes ie 'ReadTrainingData' meaningful or could it be exchanged with any other label? In other words how is that label being used?

train_data = pipeline | 'ReadTrainingData' >> _ReadData(training_data)
evaluate_data = pipeline | 'ReadEvalData' >> _ReadData(eval_data)

input_metadata = dataset_metadata.DatasetMetadata(schema=input_schema)

_ = (input_metadata
| 'WriteInputMetadata' >> tft_beam_io.WriteMetadata(
       os.path.join(output_dir, path_constants.RAW_METADATA_DIR),
       pipeline=pipeline))

preprocessing_fn = reddit.make_preprocessing_fn(frequency_threshold)
(train_dataset, train_metadata), transform_fn = (
  (train_data, input_metadata)
  | 'AnalyzeAndTransform' >> tft.AnalyzeAndTransformDataset(
      preprocessing_fn))

推荐答案

Python 中的运算符可以重载.在 Beam 中,|apply 的同义词,它将 PTransform 应用到 PCollection 以产生一个新的 <代码>PCollection.>> 允许您命名一个步骤以便在各种 UI 中更容易显示——|>> 之间的字符串仅用于这些显示目的和识别该特定应用程序.

Operators in Python can be overloaded. In Beam, | is a synonym for apply, which applies a PTransform to a PCollection to produce a new PCollection. >> allows you to name a step for easier display in various UIs -- the string between the | and the >> is only used for these display purposes and identifying that particular application.

参见 https://beam.apache.org/documentation/programming-guide/#transforms

这篇关于解释 Apache Beam python 语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆