如何计算缓存中特定RDD的大小? [英] How do I figure out the size of specific RDDs in the cache?

查看：57 发布时间：2021/4/8 19:47:15 apache-spark

本文介绍了如何计算缓存中特定RDD的大小?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我经常处理因超出内存限制而被YARN杀死的容器.我怀疑这与以低效方式缓存/取消保留RDDS/Dataframe有关.

I am frequently dealing with containers getting killed by YARN for exceeding memory limits. I suspect it has to do with caching/unpersisting RDDS/Dataframes in an inefficient manner.

调试此类问题的最佳方法是什么?

What is the best way to debug this type of issue?

我已经查看了Spark Web UI中的存储"选项卡，但是"RDD名称"没有比"MapPartitionsRDD"或"UnionRDD"更具描述性.如何确定哪些特定的RDD占用了缓存中的最大空间?

I have looked at the "Storage" tab in the Spark Web UI, but the "RDD Names" don't get any more descriptive than "MapPartitionsRDD" or "UnionRDD". How do I figure out which specific RDDs take up the most space in the cache?

为了找出内存不足"错误，我将需要找出哪些RDD占用了缓存中的最大空间.我还希望能够跟踪他们何时坚持下来.

In order to figure out the Out of Memory errors, I will need to figure out which RDDs are taking up the most space in the cache. I also want to be able to track when they get unpersisted.

如何计算缓存中特定RDD的大小? [英] How do I figure out the size of specific RDDs in the cache?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算缓存中特定RDD的大小? [英] How do I figure out the size of specific RDDs in the cache?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭