火花小ORC条纹 [英] Spark Small ORC Stripes

查看:63
本文介绍了火花小ORC条纹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用Spark整理点击流数据,然后以ORC + zlib格式将其写入S3,我尝试更改Spark中的许多设置,但是创建的ORC文件的结果条带大小仍然很小(<2MB)

We use Spark to flatten out clickstream data and then write the same to S3 in ORC+zlib format, I have tried changing many settings in Spark but still the resultant stripe sizes of the ORC file getting created are very small (<2MB)

到目前为止,我尝试减小条纹大小的事情

Things which I tried so far to decrease the stripe size,

以前,每个文件的大小为20MB,使用合并功能,我现在创建的文件大小为250-300MB,但每个文件仍然有200条带,即每个条带<2MB

Earlier each file was 20MB in size, using coalesce I am now creating files which are of 250-300MB in size, but still there are 200 stripes per file i.e each stripe <2MB

通过将hive.exec.orc.default.stripe.size设置为67108864尝试使用hivecontext而不是sparkcontext,但是spark不遵守这些参数.

Tried using hivecontext instead of sparkcontext by setting hive.exec.orc.default.stripe.size to 67108864, but spark isn't honoring these parameters.

那么,关于如何增加正在创建的ORC文件的条带大小的任何想法吗?因为小条带的问题是,当我们使用Presto查询这些ORC文件并且条带大小小于8MB时,Presto将读取整个数据文件,而不是查询中的选定字段.

So, Any idea on how can I increase the stripe sizes of ORC files being created ? because the problem with small stripes is , when we are querying these ORC files using Presto and when stripe size is less than 8MB, then Presto will read the whole data file instead of the selected fields in the query.

与Presto Stripe问题相关的线程: https://groups.google.com/forum/#!topic/presto-users/7NcrFvGpPaA

Presto Stripe issue related thread: https://groups.google.com/forum/#!topic/presto-users/7NcrFvGpPaA

推荐答案

我已经在HDP社区平台上发布了相同的问题,并且得到了以下答复,

I have posted the same question over HDP Community platform and I got the below response,

HIVE-13232 (已在Hive 1.3中修复.0、2.0.1、2.1.0),但所有Apache Spark仍使用Hive 1.2.1库.

"It's related to HIVE-13232 (fixed in Hive 1.3.0, 2.0.1, 2.1.0), but all Apache Spark still uses Hive 1.2.1 library.

您是否可以尝试HDP 2.6.3+(最新的版本是2.6.4).HDP Spark 2.2具有该固定的配置单元库."

Could you try HDP 2.6.3+ (2.6.4 is the latest one). HDP Spark 2.2 has that fixed hive library."

这篇关于火花小ORC条纹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆