Parquet vs ORC vs ORC with Snappy [英] Parquet vs ORC vs ORC with Snappy

查看：16 发布时间：2021/12/28 23:32:47 hadoop hive parquet snappy orc

本文介绍了Parquet vs ORC vs ORC with Snappy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对 Hive 可用的存储格式进行一些测试，并使用 Parquet 和 ORC 作为主要选项.我包括一次默认压缩的 ORC 和一次 Snappy.

I am running a few tests on the storage formats available with Hive and using Parquet and ORC as major options. I included ORC once with default compression and once with Snappy.

我读过很多文档，说 Parquet 在时间/空间复杂度上比 ORC 更好，但我的测试与我浏览过的文档相反.

I have read many a documents that state Parquet to be better in time/space complexity as compared to ORC but my tests are opposite to the documents I went through.

关注我的数据的一些细节.

Follows some details of my data.

Table A- Text File Format- 2.5GB

Table B - ORC - 652MB

Table C - ORC with Snappy - 802MB

Table D - Parquet - 1.9 GB

就我的桌子的压缩而言，Parquet 是最糟糕的.

Parquet was worst as far as compression for my table is concerned.

我对上述表格的测试产生了以下结果.

My tests with the above tables yielded following results.

行数操作

Text Format Cumulative CPU - 123.33 sec

Parquet Format Cumulative CPU - 204.92 sec

ORC Format Cumulative CPU - 119.99 sec 

ORC with SNAPPY Cumulative CPU - 107.05 sec

列操作的总和

Text Format Cumulative CPU - 127.85 sec   

Parquet Format Cumulative CPU - 255.2 sec   

ORC Format Cumulative CPU - 120.48 sec   

ORC with SNAPPY Cumulative CPU - 98.27 sec

列操作的平均值

Text Format Cumulative CPU - 128.79 sec

Parquet Format Cumulative CPU - 211.73 sec    

ORC Format Cumulative CPU - 165.5 sec   

ORC with SNAPPY Cumulative CPU - 135.45 sec

使用 where 子句从给定范围中选择 4 列

Text Format Cumulative CPU -  72.48 sec 

Parquet Format Cumulative CPU - 136.4 sec       

ORC Format Cumulative CPU - 96.63 sec 

ORC with SNAPPY Cumulative CPU - 82.05 sec

这是否意味着 ORC 比 Parquet 快?或者我可以做些什么来使其在查询响应时间和压缩率方面更好地工作?

Does that mean ORC is faster then Parquet? Or there is something that I can do to make it work better with query response time and compression ratio?

谢谢！

Parquet vs ORC vs ORC with Snappy [英] Parquet vs ORC vs ORC with Snappy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Parquet vs ORC vs ORC with Snappy [英] Parquet vs ORC vs ORC with Snappy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭