如何在蜂巢中为每个组采样? [英] How to sample for each group in hive?

查看：91 发布时间：2020/11/22 2:32:23 hadoop hive hiveql

本文介绍了如何在蜂巢中为每个组采样?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在蜂巢中有一张大桌子，价值15亿+.列之一是category_id，它具有〜20个不同的值.我想对表格进行采样，以使每个类别都有100万的值.

I have a large table in hive that has 1.5 bil+ values. One of the columns is category_id, which has ~20 distinct values. I want to sample the table such that I have 1 mil values for each category.

我检出了带有Hive的随机样本表，但包括匹配的行和配置单元:从大表创建较小的表，我想出了如何从整个表格中获取随机样本，但是我仍然无法弄清楚如何为每个category_id获取样本.

I checked out Random sample table with Hive, but including matching rows and Hive: Creating smaller table from big table and I figured out how to get a random sample from the entire table, but I'm still unable to figure out how to get a sample for each category_id.

如何在蜂巢中为每个组采样? [英] How to sample for each group in hive?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在蜂巢中为每个组采样? [英] How to sample for each group in hive?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭