我们如何决定总数?用于蜂巢桌的桶 [英] How can we decide the total no. of buckets for a hive table

查看:198
本文介绍了我们如何决定总数?用于蜂巢桌的桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对hadoop有点新鲜感。根据我的知识,水桶固定不变。蜂巢表和配置单元中的分区使用no。的减少者与总数相同。在创建表格时定义的存储桶。那么谁能告诉我如何计算总数?的桶在蜂巢表中。是否有计算桶的总数的公式?解析方案

如果你想知道你应该选择多少桶 CLUSTER BY 子句,我相信选择一个数字会导致桶的大小等于或低于HDFS块大小。



这应该有助于避免HDFS将内存分配给大部分为空的文件。



还要选择一个2的幂数。



你可以检查你的HDFS块大小

  hdfs getconf -confKey dfs.blocksize 


i am bit new to hadoop. As per my knowledge buckets are fixed no. of partitions in hive table and hive uses the no. of reducers same as the total no. of buckets defined while creating the table. So can anyone tell me how to calculate the total no. of buckets in a hive table. Is there any formula for calculating the total number of buckets ?

解决方案

If you want to know how many buckets you should choose in your CLUSTER BY clause, I believe it is good to choose a number that results in buckets that are at or just below your HDFS block size.

This should help avoid having HDFS allocate memory to files that are mostly empty.

Also choose a number that is a power of two.

You can check your HDFS block size with:

hdfs getconf -confKey dfs.blocksize

这篇关于我们如何决定总数?用于蜂巢桌的桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆