如何知道执行分区的工作人员? [英] How to know which worker a partition is executed at?

查看:86
本文介绍了如何知道执行分区的工作人员?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是试图找到一种方法来获取Spark中RDD分区的位置.

I just try to find a way to get the locality of a RDD's partition in Spark.

在调用 RDD.repartition() PairRDD.combineByKey()之后,返回的RDD被分区.我想知道分区位于哪个工作实例(用于检查分区行为)?!

After calling RDD.repartition() or PairRDD.combineByKey() the returned RDD is partitioned. I'd like to know which worker instances the partitions are at (for examining the partition behaviour)?!

有人可以提供线索吗?

推荐答案

我确定这是一个有趣的问题,没有那么有趣的答案:)

An interesting question that I'm sure has not so much interesting answer :)

首先,将转换应用于RDD与工作实例无关,因为它们是单独的实体".转换会创建 RDD谱系(=逻辑计划),而执行者只有在执行了某个动作之后才上台执行(无双关语)(DAGScheduler将逻辑计划转换为执行计划,包括一系列阶段)任务).

First of all, applying transformations to your RDD has nothing to do with worker instances as they are separate "entities". Transformations create a RDD lineage (= a logical plan) while executors come to stage (no pun intended) only after an action is executed (and DAGScheduler transforms the logical plan into execution plan as a set of stages with tasks).

因此,我相信知道执行分区的执行程序的唯一方法是使用 org.apache.spark.SparkEnv 来访问与单个执行程序相对应的BlockManager.这正是Spark通过其BlockManager认识/跟踪执行者的方式.

So, I believe the only way to know what executor a partition is executed at is to use org.apache.spark.SparkEnv to access the BlockManager that corresponds to a single executor. That's exactly how Spark knows/tracks executors (by their BlockManagers).

您可以编写 org.apache.spark.scheduler.SparkListener 会拦截 onExecutorAdded onBlockManagerAdded 及其对应的 * Removed 对应对象,以了解如何将执行者映射到BlockManagers(但相信 SparkEnv 就足够了.)

You could write a org.apache.spark.scheduler.SparkListener that would intercept onExecutorAdded, onBlockManagerAdded and their *Removed counterparts to know how to map executors to BlockManagers (but believe SparkEnv is enough).

这篇关于如何知道执行分区的工作人员?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆