调试“检测到托管内存泄漏".在Spark 1.6.0中 [英] Debugging "Managed memory leak detected" in Spark 1.6.0

查看:219
本文介绍了调试“检测到托管内存泄漏".在Spark 1.6.0中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经尝试升级到Apache Spark 1.6.0 RC3.现在,我的应用程序几乎针对每个任务都发送了这些错误消息:

I've tried upgrading to Apache Spark 1.6.0 RC3. My application now spams these errors for nearly every task:

Managed memory leak detected; size = 15735058 bytes, TID = 830

我已将org.apache.spark.memory.TaskMemoryManager的日志记录级别设置为DEBUG,并在日志中查看:

I've set logging level for org.apache.spark.memory.TaskMemoryManager to DEBUG and see in the logs:

I2015-12-18 16:54:41,125 TaskSetManager: Starting task 0.0 in stage 7.0 (TID 6, localhost, partition 0,NODE_LOCAL, 3026 bytes)
I2015-12-18 16:54:41,125 Executor: Running task 0.0 in stage 7.0 (TID 6)
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,188 TaskMemoryManager: Task 6 acquire 5.0 MB for null
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,262 TaskMemoryManager: Task 6 acquire 5.0 MB for null
D2015-12-18 16:54:41,397 TaskMemoryManager: Task 6 release 5.0 MB from null
E2015-12-18 16:54:41,398 Executor: Managed memory leak detected; size = 5245464 bytes, TID = 6

如何调试这些错误?有没有一种方法可以记录堆栈跟踪以了解分配和释放情况,所以我可以找到泄漏的地方?

How do you debug these errors? Is there a way to log stack traces for allocations and deallocations, so I can find what leaks?

我对新的统一内存管理器( SPARK-10000 )了解不多.泄漏可能是我的错,还是火花漏洞?

I don't know much about the new unified memory manager (SPARK-10000). Is the leak likely my fault or is it likely a Spark bug?

推荐答案

简短的答案是用户不应看到此消息.用户不应在统一内存管理器中造成内存泄漏.

The short answer is that users are not supposed to see this message. Users are not supposed to be able to create memory leaks in the unified memory manager.

发生此类泄漏是一个Spark错误: SPARK-11293

That such leaks happen is a Spark bug: SPARK-11293

但是,如果您想了解内存泄漏的原因,这就是我的方法.

But if you want to understand the cause of a memory leak, this is how I did it.

  1. 下载Spark源代码,并确保您可以对其进行构建并且可以进行构建.
  2. TaskMemoryManager.java中的acquireExecutionMemoryreleaseExecutionMemory中添加额外的日志记录:logger.error("stack trace:", new Exception());
  3. 将所有其他调试日志更改为TaskMemoryManager.java中的错误. (比找出日志记录配置更容易...)
  1. Download the Spark source code and make sure you can build it and your build works.
  2. In TaskMemoryManager.java add extra logging in acquireExecutionMemory and releaseExecutionMemory: logger.error("stack trace:", new Exception());
  3. Change all the other debug logs to error in TaskMemoryManager.java. (Easier than figuring out logging configurations...)

现在,您将看到所有分配和释放的完整堆栈跟踪.尝试将它们匹配起来,找到没有取消分配的分配.现在,您已经有了泄漏源的堆栈跟踪.

Now you will see the full stack trace for all allocations and deallocations. Try to match them up and find the allocations without deallocations. You now have the stack trace for the source of the leak.

这篇关于调试“检测到托管内存泄漏".在Spark 1.6.0中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆