Spark流自定义指标 [英] Spark streaming custom metrics

查看:130
本文介绍了Spark流自定义指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个Spark Streaming程序,该程序检索Kafka流,对该流进行非常基本的转换,然后将数据插入DB(voltdb,如果相关). 我正在尝试测量将行插入数据库的速率.我认为指标可能有用(使用JMX).但是我找不到如何向Spark添加自定义指标.我查看了Spark的源代码,还发现

I'm working on a Spark Streaming program which retrieves a Kafka stream, does very basic transformation on the stream and then inserts the data to a DB (voltdb if it's relevant). I'm trying to measure the rate in which I insert rows to the DB. I think metrics can be useful (using JMX). However I can't find how to add custom metrics to Spark. I've looked at Spark's source code and also found this thread however it doesn't work for me. I also enabled the JMX sink in the conf.metrics file. What's not working is I don't see my custom metrics with JConsole.

有人可以解释如何添加自定义指标(最好通过JMX)来触发流媒体吗?或者,如何测量我对数据库(特别是VoltDB)的插入率? 我在Java 8中使用Spark.

Could someone explain how to add custom metrics (preferably via JMX) to spark streaming? Or alternatively how to measure my insertion rate to my DB (specifically VoltDB)? I'm using spark with Java 8.

推荐答案

好吧,我发现了如何添加我自己的自定义指标.它需要三件事:

Ok after digging through the source code I found how to add my own custom metrics. It requires 3 things:

  1. 创建自己的自定义来源.有点像
  2. 在sparkmetrics.properties文件中启用Jmx接收器.我使用的特定行是:*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink,它为所有实例启用JmxSink
  3. 在SparkEnv指标系统中注册我的自定义源.可以看到一个示例操作
  1. Create my own custom source. Sort of like this
  2. Enable the Jmx sink in the spark metrics.properties file. The specific line I used is: *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink which enable JmxSink for all instances
  3. Register my custom source in the SparkEnv metrics system. An example of how to do can be seen here - I actually viewed this link before but missed the registration part which prevented me from actually seeing my custom metrics in the JVisualVM

由于代码在执行程序上运行,因此我仍然在实际计算VoltDB插入次数方面仍在挣扎,但这是另一个主题的主题:)

I'm still struggling with how to actually count the number of insertions into VoltDB because the code runs on the executors but that's a subject for a different topic :)

我希望这对其他人有帮助

I hope this will help others

这篇关于Spark流自定义指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆