spark.sql.hive.filesourcePartitionFileCacheSize [英] spark.sql.hive.filesourcePartitionFileCacheSize

查看:491
本文介绍了spark.sql.hive.filesourcePartitionFileCacheSize的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想知道是否有人知道此警告信息

Just wonder if anyone is aware of this warning info

18/01/10 19:52:56 WARN SharedInMemoryCache: Evicting cached table partition metadata from memory due to size constraints
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). This may impact query planning performance

当尝试将具有许多分区的大数据帧从S3加载到spark时,我已经看到很多.

I've seen this a lot when trying to load some big dataframe with many partitions from S3 into spark.

它从来没有真正对工作造成任何问题,只是想知道该config属性的用途是什么,以及如何对其进行适当的调整.

It never really causes any issues to the job, just wonder what is the use of that config property and how to tune it properly.

谢谢

推荐答案

在回答您的问题时,这是spark-hive特定的配置属性,当非零时,可以在内存中缓存分区文件元数据.所有表共享一个高速缓存,该高速缓存最多可以使用指定的num个字节来存储文件元数据.此conf仅在启用配置单元文件源分区管理时才有效.

In answer to your question, this is a spark-hive specific config property which, when nonzero, enable caching of partition file metadata in memory. All tables share a cache that can use up to specified num bytes for file metadata. This conf only has an effect when hive filesource partition management is enabled.

在spark源代码中,其编写方式如下所示.根据代码的默认大小为250 * 1024 * 1024,您可以尝试在代码中/在spark-submit命令中使用SparkConf对象来操纵该大小.

In spark source code it is written like the following. The default size is 250 * 1024 * 1024 as per code which you can try to manipulate by your SparkConf object in your code/in spark-submit command.

火花源代码

val HIVE_FILESOURCE_PARTITION_FILE_CACHE_SIZE =
    buildConf("spark.sql.hive.filesourcePartitionFileCacheSize")
      .doc("When nonzero, enable caching of partition file metadata in memory. All tables share " +
           "a cache that can use up to specified num bytes for file metadata. This conf only " +
           "has an effect when hive filesource partition management is enabled.")
      .longConf
      .createWithDefault(250 * 1024 * 1024)

这篇关于spark.sql.hive.filesourcePartitionFileCacheSize的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆