如何避免为Hive查询生成空的.deflate文件? [英] How to avoid generating empty .deflate files for a Hive query?
问题描述
当我运行一个Hive查询时,会生成大量空的 .deflate
文件(它们实际上是大约8个字节,我认为它是 .deflate
文件)。我怀疑这是因为查询需要大量的reducer。我想知道是否有一种方法可以避免生成这些空的 .deflate
文件?
/ p>
。 deflate
是默认的 压缩编解码器
Hive
的压缩设置可用于减少 Hive的磁盘空间量
查询
。
当属性 hive.exec.compress.output = true
时, Hive
将会使用由 mapred.map.output.compression.codec
配置的 codec
属性来压缩 HDFS
中的存储。这些属性可以在 hive.site.xml
或 Hive-CLI $要从
> Hive-CLI
中启用输出压缩, strong>。:
> hive> set hive.exec.compress.output = true;
使用 hive.site.xml
< property>
< name> hive.exec.compress.output< / name>
<值> true< /值>
< / property>
因此,要禁用 .deflate
file:
set hive.exec.compress.output = false;
When I run a Hive query, a large number of empty .deflate
files are generated (they are actually about 8 bytes, which I think is the minimum size for a .deflate
file). I suspect this is happening because the query requires a large number of reducers. I am wondering if there is a way to avoid generating these empty .deflate
files?
Thanks in advance,
Lin
.deflate
is the default compression codec
There are compression settings for Hive
that can be used to reduce the amount of disk space that Hive
uses for its queries
.
When the property hive.exec.compress.output=true
, Hive
will use the codec
configured by the mapred.map.output.compression.codec
property to compress the storage in HDFS
. These properties can be set in the hive.site.xml
or in the Hive-CLI
.
To enable output compression from Hive-CLI
.:
hive> set hive.exec.compress.output=true;
To enable output compression using hive.site.xml
<property>
<name>hive.exec.compress.output</name>
<value>true</value>
</property>
So to disable the .deflate
file:
set hive.exec.compress.output=false;
这篇关于如何避免为Hive查询生成空的.deflate文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!