我们可以预测Hive SELECT *查询结果的顺序吗? [英] Can we predict the order of the results of a Hive SELECT * query?

查看:87
本文介绍了我们可以预测Hive SELECT *查询结果的顺序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果使用相同的DBMS作为Metastore,那么 SELECT *查询(无ORDER BY)的结果的顺序是否可能始终相同?

Is it possible that the order of the results of a SELECT * query (no ORDER BY) is always the same provided that the same DBMS is used as Metastore?

因此,只要将MySQL用作Metastore, SELECT *; 查询的结果顺序将始终相同.如果使用Postgres,则在相同数据上的顺序将始终相同,但与使用MySQL时的顺序不同.我正在谈论相同的数据.

So, as long as MySQL is used as Metastore, the order of the results for a SELECT *; query will always be the same. If Postgres is used, the order will be always the same on the same data, but different from when MySQL is used. I am talking about the same data.

也许所有这些都归结为以下问题:默认结果的顺序是什么,以及为什么MySQL和Postgres Metastore会有所不同.

Maybe it all boils down to the question of what is the default order of results and why is it different for MySQL and Postgres Metastore.

推荐答案

没有默认的行顺序,如果没有 ORDER BY ,则不能保证顺序.这个事实与所使用的Metastore数据库无关.

There is no such thing as default order of rows, without ORDER BY the order is not guaranteed. This fact is not connected with Metastore database used.

通常,许多进程(映射器)并行读取数据,计算拆分之后,每个进程都将根据计算的拆分开始读取某个文件或几个文件.所有并行进程可以处理不同数量的数据并在不同的节点上运行,每次负载都不相同,因此它们取决于节点负载,网络负载,容量等太多因素而开始返回行并在不同的时间完成每个流程等数据的数量等.除去所有这些因素,可以提高订单预测的准确性.说,单线程顺序文件读取将以与文件中相同的顺序返回行.但这不是数据库的工作方式.

In general data is being read in parallel by many processes (mappers), after splits were calculated, each process starts reading some piece of file or few files, depending on splits calculated. All parallel processes can process different volume of data and running on different nodes, the load is not the same each time, so they start returning rows and finishing at different times, depending on too many factors, such as node load, network load, volume of data per process, etc, etc. Removing all this factors you can increase the order prediction accuracy. Say, single thread sequential file read will return rows in the same order as they are in the file. But this is not how the database works.

根据Codd的关系理论,列和行的顺序对数据库也无关紧要.

Also according to Codd's relational theory, the order of columns and rows is immaterial to the database.

这篇关于我们可以预测Hive SELECT *查询结果的顺序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆