df.cache()存放在哪里 [英] where does df.cache() is stored
问题描述
我想了解下面代码存储在哪个节点(驱动程序或工作程序/执行程序)
I would like to understand in which node (driver or worker/executor) does below code is stored
df.cache() //df is a large dataframe (200GB)
并且具有更好的性能:使用sql cachetable
或cache()
.我的理解是,其中一个很懒,另一个很渴望.
And which has a better performance: using sql cachetable
or cache()
. My understanding is that one of them is lazy and the other is eager.
推荐答案
df.cache()
调用persist()
方法,该方法在存储级别存储为MEMORY_AND_DISK
,但是您可以更改存储级别
df.cache()
calls the persist()
method which stores on storage level as MEMORY_AND_DISK
, but you can change the storage level
persist()
方法调用
sparkSession.sharedState.cacheManager.cacheQuery()
当您看到cacheTable
的代码时,它也会调用相同的代码
sparkSession.sharedState.cacheManager.cacheQuery()
The persist()
method calls
sparkSession.sharedState.cacheManager.cacheQuery()
and when you see the code for cacheTable
it also calls the same
sparkSession.sharedState.cacheManager.cacheQuery()
这意味着两者相同并且延迟评估(仅在执行操作后才评估),除了persist
方法可以存储为提供的存储级别,这些是可用的存储级别
that means both are same and are lazily evaluated (only evaluated once action is performed), except persist
method can store as the storage level provided, these are the available storage level
- 无
- DISK_ONLY
- DISK_ONLY_2
- MEMORY_ONLY
- MEMORY_ONLY_2
- MEMORY_ONLY_SER
- MEMORY_ONLY_SER_2
- MEMORY_AND_DISK
- MEMORY_AND_DISK_2
- MEMORY_AND_DISK_SER
- MEMORY_AND_DISK_SER_2
- OFF_HEAP
- NONE
- DISK_ONLY
- DISK_ONLY_2
- MEMORY_ONLY
- MEMORY_ONLY_2
- MEMORY_ONLY_SER
- MEMORY_ONLY_SER_2
- MEMORY_AND_DISK
- MEMORY_AND_DISK_2
- MEMORY_AND_DISK_SER
- MEMORY_AND_DISK_SER_2
- OFF_HEAP
您还可以使用SQL CACHE TABLE
,它不会被延迟计算并将整个表存储在内存中,这也可能导致OOM
You can also use the SQL CACHE TABLE
which is not lazily evaluated and stores the whole table in memory, which may also lead to OOM
摘要::cache()
,persist()
,cacheTable()
的评估是懒惰的,需要执行操作才能正常工作,因为SQL CACHE TABLE
迫切
Summary: cache()
, persist()
, cacheTable()
are lazily evaluated and need to perform an action to work where as SQL CACHE TABLE
is an eager
在此处查看您可以根据需要选择!
希望这会有所帮助!
这篇关于df.cache()存放在哪里的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!