Pig vs Hive vs Native Map Reduce [英] Pig vs Hive vs Native Map Reduce

查看:34
本文介绍了Pig vs Hive vs Native Map Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Pig、Hive 抽象是什么有了基本的了解.但是对于需要 Hive、Pig 或 native map reduce 的场景,我没有一个清晰的想法.

I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce.

我浏览了几篇文章,这些文章基本上指出 Hive 用于结构化处理,而 Pig 用于非结构化处理.我们什么时候需要原生 map reduce?你能指出一些使用 Pig 或 Hive 无法解决但使用原生 map reduce 的场景吗?

I went through few articles which basically points out that Hive is for structured processing and Pig is for unstructured processing. When do we need native map reduce? Can you point out few scenarios that can't be solved using Pig or Hive but in native map reduce?

推荐答案

复杂的分支逻辑,它有很多嵌套的 if .. else .. 结构在标准 MapReduce 中更容易和更快地实现,用于处理您可以使用的结构化数据Pangool,它还简化了 JOIN 之类的事情.此外,标准 MapReduce 可让您完全控制以最大限度地减少数据处理流程所需的 MapReduce 作业数量,从而转化为性能.但是它需要更多的时间来编码和引入更改.

Complex branching logic which has a lot of nested if .. else .. structures is easier and quicker to implement in Standard MapReduce, for processing structured data you could use Pangool, it also simplifies things like JOIN. Also Standard MapReduce gives you full control to minimize the number of MapReduce jobs that your data processing flow requires, which translates into performance. But it requires more time to code and introduce changes.

Apache Pig 也适用于结构化数据,但它的优势在于能够处理 BAG 数据(按键分组的所有行),实现以下内容更简单:

Apache Pig is good for structured data too, but its advantage is the ability to work with BAGs of data (all rows that are grouped on a key), it is simpler to implement things like:

  1. 获取每组的前 N ​​个元素;
  2. 计算每个组的总数,然后将该总数与组中的每一行进行对比;
  3. 使用布隆过滤器进行 JOIN 优化;
  4. 多查询支持(当 PIG 尝试通过在单个作业中执行更多操作来最小化 MapReduce 作业的数量时)

Hive 更适合临时查询,但它的主要优点是它具有存储和分区数据的引擎.但是它的表可以从 Pig 或 Standard MapReduce 中读取.

Hive is better suited for ad-hoc queries, but its main advantage is that it has engine that stores and partitions data. But its tables can be read from Pig or Standard MapReduce.

还有一点,Hive 和 Pig 不太适合处理分层数据.

One more thing, Hive and Pig are not well suited to work with hierarchical data.

这篇关于Pig vs Hive vs Native Map Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆