配置单元断开的管道错误 [英] Hive Broken pipe error

查看:162
本文介绍了配置单元断开的管道错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


INSERT INTO OVERWRITE .... TRANSFORM(... 。)USING'python script.py'FROM .... LEFT OUTER JOIN。 。 。左外连接。 。 。 LEFT OUTER JOIN


开始时一切正常,直到我们加载了大量的虚拟数据。我们只是在一些字段上写下相同的记录,而且变化很小。之后,我们再次运行这个程序,我们得到一个断开的管道错误,没有太多的信息。没有关于错误的日志,只是IOException:断开的管道错误。 。 。 。

为简化脚本并隔离错误,我们将脚本修改为

  for sys.stdin.readlines()中的行:
print line

避免该级别的任何错误。我们仍然有同样的错误。

解决方案

这个问题似乎可以通过在不同查询中拆分如此多的连接并使用中间表来解决。然后,您只需添加一个最后一个查询,其中包含汇总所有先前结果的最后一个查询据我的理解,这意味着在脚本级别没有错误,但太多的数据无法通过配置单元处理


I have been working on a project that include a hive query.

INSERT INTO OVERWRITE .... TRANSFORM (....) USING 'python script.py' FROM .... LEFT OUTER JOIN . . . LEFT OUTER JOIN . . . LEFT OUTER JOIN

At the begining everything work fine until we loaded a big amount of dummy data. We just write the same records with small variations on some fields. After that we run this again and we are getting a Broken pipe error without much information. There is no log about the error, just the IOException: Broken pipe error. . . .

To simplify the script and isolate errors we modify the script to

for line in sys.stdin.readlines():
    print line

to avoid any error at that level. We still have the same error.

解决方案

The problem seems to be solved by spliting so many joins in different queries and using intermediate tables. Then you just add a final query with a last join summarizing all the previous results. As I understand this mean no error at the script level but too many data to handle by hive

这篇关于配置单元断开的管道错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆