Pig skewed join 与大表导致“拆分元数据大小超过 10000000" [英] pig skewed join with a big table causes "Split metadata size exceeded 10000000"

查看：27 发布时间：2021/11/12 4:11:41 hadoop apache-pig skew

本文介绍了Pig skewed join 与大表导致“拆分元数据大小超过 10000000"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们在一个小的(16M 行)不同的表和一个大的(6B 行)倾斜的表之间有一个猪连接.常规连接在 2 小时内完成(经过一些调整).我们尝试了 using skewed 并且能够将性能提高到 20 分钟.

We have a pig join between a small (16M rows) distinct table and a big (6B rows) skewed table. A regular join finishes in 2 hours (after some tweaking). We tried using skewed and been able to improve the performance to 20 minutes.

然而，当我们尝试更大的倾斜表(19B 行)时，我们会从 SAMPLER 作业中得到以下消息:

HOWEVER, when we try a bigger skewed table (19B rows), we get this message from the SAMPLER job:

Split metadata size exceeded 10000000. Aborting job job_201305151351_21573 [ScriptRunner]
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
at org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:817) [ScriptRunner]

每次我们尝试使用倾斜时都可以重现这种情况，而在我们使用常规连接时不会发生这种情况.

This is reproducible every time we try using skewed, and does not happen when we use the regular join.

我们尝试设置 mapreduce.jobtracker.split.metainfo.maxsize=-1，我们可以在 job.xml 文件中看到它，但它没有改变任何东西！

we tried setting mapreduce.jobtracker.split.metainfo.maxsize=-1 and we can see it's there in the job.xml file, but it doesn't change anything!

这里发生了什么?这是 using skewed 创建的分发示例的错误吗?为什么将参数更改为 -1 没有帮助?

What's happening here? Is this a bug with the distribution sample created by using skewed? Why doesn't it help changing the param to -1?

Pig skewed join 与大表导致“拆分元数据大小超过 10000000" [英] pig skewed join with a big table causes "Split metadata size exceeded 10000000"

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Pig skewed join 与大表导致“拆分元数据大小超过 10000000" [英] pig skewed join with a big table causes &quot;Split metadata size exceeded 10000000&quot;

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Pig skewed join 与大表导致“拆分元数据大小超过 10000000" [英] pig skewed join with a big table causes "Split metadata size exceeded 10000000"

登录关闭