为什么RDD是不是在内存中坚持了火花每次迭代？ [英] Why the RDD is not persisted in memory for every iteration in spark?

查看：237 发布时间：2016/5/22 16:50:36 scala apache-spark

本文介绍了为什么RDD是不是在内存中坚持了火花每次迭代？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的机器学习应用的火花。火花和Hadoop共享同一台计算机集群，任何资源管理器如纱。在运行火花任务，我们可以运行Hadoop的工作。

I use the spark for machine learning application. The spark and hadoop share the same computer clusters with out any resource manger such as yarn. We can run hadoop job while running spark task.

但机器学习中的应用跑的那么慢。我发现每互为作用，一些工人需要添加一些RDD到内存中。就像这样：

But the machine learning application run so slowly. I found that for every interation, some workers need to add some rdd into memory. Just like this:

243413 14/07/23 13:30:07 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_2_17 in memory on XXX:48238 (size: 118.3 MB, free: 16.2 GB)
243414 14/07/23 13:30:07 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_2_17 in memory on XXX:48238 (size: 118.3 MB, free: 16.2 GB)
243415 14/07/23 13:30:08 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_2_19 in memory on TS-XXX:48238 (size: 119.0 MB, free: 16.1 GB)

所以，我觉得对于重装RDD的重新计算使应用程序非常缓慢。

So, I think the recomputing for reload the rdd make the application so slowly.

然后，我的问题是，为什么在RDD内存并没有持续，当有足够的可用内存？因为Hadoop作业的？

Then, my question is why the rdd was not persisted in the memory when there was enough free memory? because of the hadoop jobs?

我添加以下JVM参数：-Xmx10g -Xms10g

I add the following jvm parameters: -Xmx10g -Xms10g

我发现有少RDD添加动作比以前，任务运行时间比以前缩短。但对于一个阶段的总时间也过大。在WebUI，我发现：

I found there was less rdd add actions than before, and the task run time was shorter than before. But the total time for one stage is also too large. From the webUI, I found that:

有关的每一个阶段，所有的工人不同时启动。例如，当worker_1成品10的任务，该worker_2出现在WebUI并启动任务。而这种LED的很长一段时间的阶段。

For every stage, all the workers were not start at the same time. For example, when the worker_1 finished 10 tasks, the worker_2 appear on the webUI and start the tasks. And this leds to a long time stage.

我们的火花集群工作在独立模式。

our spark cluster works in standalone model.

为什么RDD是不是在内存中坚持了火花每次迭代？ [英] Why the RDD is not persisted in memory for every iteration in spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么RDD是不是在内存中坚持了火花每次迭代？ [英] Why the RDD is not persisted in memory for every iteration in spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭