自定义源/接收器配置无法识别 [英] Custom source/sink configurations not getting recognized

查看:73
本文介绍了自定义源/接收器配置无法识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为我的Spark流媒体应用程序编写了自定义指标源/接收器,并且我试图通过metrics.properties对其进行初始化-但这对执行者无效.我无法控制Spark集群中的机器,因此无法复制集群中$ SPARK_HOME/conf/中的属性文件.我将其保存在我的应用程序所在的胖子jar中,但是当我将胖子jar下载到集群中的工作程序节点上时,执行程序已经启动,并且它们的Metrics系统已经初始化-因此无法在其中选择具有自定义源配置的文件

I've written my custom metrics source/sink for my Spark streaming app and I am trying to initialize it from metrics.properties - but that doesn't work from executors. I don't have control on the machines in Spark cluster, so I can't copy properties file in $SPARK_HOME/conf/ in the cluster. I have it in the fat jar where my app lives, but by the time my fat jar is downloaded on worker nodes in cluster, executors are already started and their Metrics system is already initialized - thus not picking my file with custom source configuration in it.

在此帖子之后,我已经指定的" spark.files = metrics.properties"和"spark.metrics.conf"= metrics.properties",但是当"metrics.properties"交付给执行者时,他们的度量系统已经初始化.

Following this post, I've specified 'spark.files = metrics.properties' and 'spark.metrics.conf=metrics.properties' but by the time 'metrics.properties' is shipped to executors, their metric system is already initialized.

如果我初始化自己的指标系统,它将获取我的文件,但随后我将丢失主/执行者级别的指标/属性(例如executor.sink.mySink.propName = myProp-无法从中读取"propName"'mySink'),因为它们是

If I initialize my own metrics system, it's picking up my file but then I'm missing master/executor level metrics/properties (eg. executor.sink.mySink.propName=myProp - can't read 'propName' from 'mySink') since they are initialized by Spark's metric system.

是否有(编程)方式在执行者之前发送'metrics.properties'

Is there a (programmatic) way to have 'metrics.properties' shipped before executors initialize their metrics system?

Update1:​​我正在独立的Spark 2.0.0集群上尝试

Update1: I am trying this on stand-alone Spark 2.0.0 cluster

Update2:关于实现此目的的技巧-在开始实际"火花作业之前,启动一个虚拟作业以在每个工作程序上复制metrics.properties.然后使用已知的文件位置开始您的实际工作.缺点-如果一个工人死亡,而另一个工人取代了它,那么该文件将不在预先知道的路径中.解决方案的替代方案-当新的工作计算机启动时,它也会从git-repo中提取metrics.properties并将其放置在预先知道的路径中.尽管它可能会起作用,但它确实非常骇人,并且Spark首选的解决方案是在内部对其进行支持.

Update2: Thought of hacks on achieving this - before starting your 'actual' spark job, start a dummy job to copy metrics.properties on each worker. Then start your actual job with pre-known file location. Cons - if a worker dies and another worker takes it's place, it won't have this file in pre-known path. Solution alternative - when a new worker machine starts, it pulls metrics.properties from your git-repo too and places it in pre-known path. Although, it may work, it's terribly hacky and a preferred solution is for Spark to support it internally.

推荐答案

SparkConf

SparkConf only load local system properties if they start with the prefix spark., do you have tray to load your properties adding spark?

这篇关于自定义源/接收器配置无法识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆