在火花加盟，确实表顺序事象猪？ [英] In spark join, does table order matter like in pig?

查看：202 发布时间：2016/5/22 15:37:57 hadoop apache-spark apache-pig bigdata

本文介绍了在火花加盟，确实表顺序事象猪？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在做一个常规连接猪，在加入不带入内存，但通过代替流，因此，如果有每个键和B大的基数小的基数，是显著更好地做到<$ C $的最后一个表C>加入A，B 比经B加入A ，从性能的角度来看（避免溢出和OOM）

When doing a regular join in pig, the last table in the join is not brought into memory but streamed through instead, so if A has small cardinality per key and B large cardinality, it is significantly better to do join A, B than join A by B, from performance perspective (avoiding spill and OOM)

是否有火花类似的概念？我没有看到任何这样的建议，并想知道它是如何可能的？实现在我看来pretty大致相同的猪：<一href=\"https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala\">https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala

Is there a similar concept in spark? I didn't see any such recommendation, and wonder how is it possible? The implementation looks to me pretty much the same as in pig: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala

还是我失去了一些东西？

Or am I missing something?

在火花加盟，确实表顺序事象猪？ [英] In spark join, does table order matter like in pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在火花加盟，确实表顺序事象猪？ [英] In spark join, does table order matter like in pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭