Hive选择数据时是否保留文件顺序 [英] Does Hive preserve file order when selecting data

查看:71
本文介绍了Hive选择数据时是否保留文件顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我这样做从table1中选择*; 将从中检索订单数据

If I do select * from table1; in which order data will retrieve

文件顺序或随机顺序

推荐答案

如果没有 ORDER BY ,则不能保证顺序.

Without ORDER BY the order is not guaranteed.

许多进程(映射器)正在并行读取数据,计算拆分之后,每个进程都将根据计算的拆分开始读取一个文件或几个文件.

Data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated.

所有并行进程可以处理不同数量的数据并在不同的节点上运行,每次负载都不相同,因此它们取决于节点负载,网络等因素而开始返回行并在不同的时间完成负载,每个进程的数据量等.

All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc.

消除所有这些因素,可以提高订单预测的准确性.说,单线程顺序文件读取可能以与文件中相同的顺序返回行.但这不是数据库的工作方式.

Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read may return rows in the same order as they are in the file. But this is not how the database works.

根据科德的关系理论,列和行的顺序无关紧要.

Also according to Codd's relational theory, the order of columns and rows is immaterial.

这篇关于Hive选择数据时是否保留文件顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆