是否有API函数显示“分数缓存"?一个RDD? [英] Is there an API function to display "Fraction Cached" for an RDD?
问题描述
在PySparkShell应用程序用户界面([服务器]:8088)的存储选项卡上,我可以看到有关正在使用的RDD的信息.列之一是 Fraction Cached .
On the Storage tab of the PySparkShell application UI ([server]:8088) I can see information about an RDD I am using. One of the column is Fraction Cached.
如何以编程方式获取该百分比?
How can I retrieve this percentage programatically?
我可以使用 getStorageLevel()
来获取有关RDD缓存的一些信息,而不能获取 Fraction Cached .
I can use getStorageLevel()
to get some information about RDD caching but not Fraction Cached.
我必须自己计算吗?
推荐答案
SparkContext.getRDDStorageInfo
可能是您正在寻找的东西.它返回Array ="nofollow noreferrer"> RDDInfo
,它提供有关以下信息:
SparkContext.getRDDStorageInfo
is probably the thing you're looking for. It returns an Array
of RDDInfo
which provides information about:
- 内存大小.
- 分区总数.
- 缓存分区的数量.
它没有直接暴露在PySpark中,因此您必须要有一点创意:
It is not directly exposed in PySpark so you'll have to be a bit creative:
from operator import truediv
storage_info = sc._jsc.sc().getRDDStorageInfo()
[{
"memSize": s.memSize(),
"numPartitions": s.numPartitions(),
"numCachedPartitions": s.numCachedPartitions(),
"fractionCached": truediv(s.numCachedPartitions(), s.numPartitions())
} for s in storage_info]
如果您有权访问 REST API ,当然可以直接使用它:
If you have access to the REST API you can of course use it directly:
import requests
url = "http://{0}:{1}/api/v1/applications/{2}/storage/rdd/".format(
host, port, sc.applicationId
)
[r.json() for r in [
requests.get("{0}{1}".format(url, rdd.get("id"))) for
rdd in requests.get(url).json()
] if r.status_code == 200]
这篇关于是否有API函数显示“分数缓存"?一个RDD?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!