HIVE中的LIMIT子句是否真的是随机的? [英] Is LIMIT clause in HIVE really random?

查看:1388
本文介绍了HIVE中的LIMIT子句是否真的是随机的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HIVE

文档

code>注意 LIMIT 子句返回随机选择的行。我在 800,000 记录中使用 LIMIT 1的表上运行了一个 SELECT ,但它总是给我返回相同的记录。



我正在使用 Shark 分配,我想知道这是否与此有关预期的行为?任何想法将不胜感激。



感谢,
Visakh

解决方案

尽管文档声明它随机返回行,但实际上并不是这样。

它返回随机选择的行,因为它在数据库中显示时没有任何where / order by子句。这意味着它不是真正的随机(或随机选择),只是这些行返回的顺序无法确定。



只要你在那里使用x DESC限制5 执行顺序,它返回你选择的最后5行。
$ b $为了获得随机返回的行,你需要使用类似于: order by rand()LIMIT 1



然而,如果您的索引设置不正确,速度可能会受到影响。通常我会用最小/最大值来获取表上的ID,然后在它们之间做一个随机数,然后选择那些记录(在你的情况下,只有1条记录),这往往比数据库要快这项工作,特别是在大型数据集上

The documentation of HIVE notes that LIMIT clause returns rows chosen at random. I have been running a SELECT table on a table with more than 800,000 records with LIMIT 1, but it always return me the same record.

I'm using the Shark distribution, and I am wondering whether this has got anything to do with this not expected behavior? Any thoughts would be appreciated.

Thanks, Visakh

解决方案

Even though the documentation states it returns rows at random, it's not actually true.

It returns "chosen rows at random" as it appears in the database without any where/order by clause. This means that it's not really random (or randomly chosen) as you would think, just that the order the rows are returned in can't be determined.

As soon as you slap a order by x DESC limit 5 on there, it returns the last 5 rows of whatever you're selecting from.

To get rows returned at random, you would need to use something like: order by rand() LIMIT 1

However it can have a speed impact if your indexes aren't setup properly. Usually I do a min/max to get the ID's on the table, and then do a random number between them, then select those records (in your case, would be just 1 record), which tends to be faster than having the database do the work, especially on a large dataset

这篇关于HIVE中的LIMIT子句是否真的是随机的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆