调试“检测到托管内存泄漏"在 Spark 1.6.0 [英] Debugging "Managed memory leak detected" in Spark 1.6.0

查看:38
本文介绍了调试“检测到托管内存泄漏"在 Spark 1.6.0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试升级到 Apache Spark 1.6.0 RC3.我的应用程序现在几乎针对每项任务都会发送这些错误:

I've tried upgrading to Apache Spark 1.6.0 RC3. My application now spams these errors for nearly every task:

Managed memory leak detected; size = 15735058 bytes, TID = 830

我已将 org.apache.spark.memory.TaskMemoryManager 的日志记录级别设置为 DEBUG 并在日志中查看:

I've set logging level for org.apache.spark.memory.TaskMemoryManager to DEBUG and see in the logs:

I2015-12-18 16:54:41,125 TaskSetManager: Starting task 0.0 in stage 7.0 (TID 6, localhost, partition 0,NODE_LOCAL, 3026 bytes)
I2015-12-18 16:54:41,125 Executor: Running task 0.0 in stage 7.0 (TID 6)
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,188 TaskMemoryManager: Task 6 acquire 5.0 MB for null
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,262 TaskMemoryManager: Task 6 acquire 5.0 MB for null
D2015-12-18 16:54:41,397 TaskMemoryManager: Task 6 release 5.0 MB from null
E2015-12-18 16:54:41,398 Executor: Managed memory leak detected; size = 5245464 bytes, TID = 6

您如何调试这些错误?有没有办法记录分配和解除分配的堆栈跟踪,以便我可以找到什么泄漏?

How do you debug these errors? Is there a way to log stack traces for allocations and deallocations, so I can find what leaks?

我对新的统一内存管理器(SPARK-10000)了解不多.泄漏可能是我的错还是 Spark 错误?

I don't know much about the new unified memory manager (SPARK-10000). Is the leak likely my fault or is it likely a Spark bug?

推荐答案

简短的回答是用户不应该看到此消息.用户不应该能够在统一内存管理器中创建内存泄漏.

The short answer is that users are not supposed to see this message. Users are not supposed to be able to create memory leaks in the unified memory manager.

发生这种泄漏是 Spark 错误:SPARK-11293

That such leaks happen is a Spark bug: SPARK-11293

但是如果您想了解内存泄漏的原因,我就是这样做的.

But if you want to understand the cause of a memory leak, this is how I did it.

  1. 下载 Spark 源代码并确保您可以构建它并且您的构建工作正常.
  2. TaskMemoryManager.java 中,在 acquireExecutionMemoryreleaseExecutionMemory 中添加额外的日志记录: logger.error("stack trace:", newException());
  3. TaskMemoryManager.java 中将所有其他调试日志更改为错误.(比找出日志配置更容易...)
  1. Download the Spark source code and make sure you can build it and your build works.
  2. In TaskMemoryManager.java add extra logging in acquireExecutionMemory and releaseExecutionMemory: logger.error("stack trace:", new Exception());
  3. Change all the other debug logs to error in TaskMemoryManager.java. (Easier than figuring out logging configurations...)

现在您将看到所有分配和释放的完整堆栈跟踪.尝试匹配它们并找到没有解除分配的分配.您现在拥有泄漏源的堆栈跟踪.

Now you will see the full stack trace for all allocations and deallocations. Try to match them up and find the allocations without deallocations. You now have the stack trace for the source of the leak.

这篇关于调试“检测到托管内存泄漏"在 Spark 1.6.0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆