选择数据时 Hive 是否保留文件顺序 [英] Does Hive preserve file order when selecting data

查看:37
本文介绍了选择数据时 Hive 是否保留文件顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我这样做 select * from table1; 检索数据的顺序

If I do select * from table1; in which order data will retrieve

文件顺序或随机顺序

推荐答案

如果没有 ORDER BY,则无法保证顺序.

Without ORDER BY the order is not guaranteed.

许多进程(映射器)并行读取数据,在计算拆分后,每个进程开始读取一些文件或几个文件,具体取决于计算的拆分.

Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated.

所有的并行进程可以处理不同的数据量并运行在不同的节点上,每次的负载都不一样,所以它们开始返回行并在不同的时间完成,这取决于太多因素,例如节点负载、网络负载、每个进程的数据量等.

All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc.

去除所有这些因素可以提高订单预测的准确性.比如说,单线程顺序文件读取可能会以与文件中相同的顺序返回行.但这不是数据库的工作方式.

Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read may return rows in the same order as they are in the file. But this is not how the database works.

同样根据 Codd 的关系理论,列和行的顺序无关紧要.

Also according to Codd's relational theory, the order of columns and rows is immaterial.

这篇关于选择数据时 Hive 是否保留文件顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆