将输出存储到单个CSV吗? [英] STORE output to a single CSV?

查看:41
本文介绍了将输出存储到单个CSV吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,当我存储到HDFS中时,它会创建许多零件文件.

Currently, when I STORE into HDFS, it creates many part files.

有什么方法可以存储到单个CSV文件中?

Is there any way to store out to a single CSV file?

推荐答案

您可以通过以下几种方式执行此操作:

You can do this in a few ways:

  • 要为所有Pig操作设置减速器数量,可以使用default_parallel属性-但这意味着每个步骤都将使用单个减速器,从而降低了吞吐量:

  • To set the number of reducers for all Pig opeations, you can use the default_parallel property - but this means every single step will use a single reducer, decreasing throughput:

set default_parallel 1;

在调用STORE之前,如果执行的操作之一是(COGROUP,CROSS,DISTINCT,GROUP,JOIN(内部),JOIN(外部)和ORDER BY),则可以使用PARALLEL 1关键字,表示使用单个reducer来完成该命令:

Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1 keyword to denote the use of a single reducer to complete that command:

GROUP a BY grp PARALLEL 1;

有关更多信息,请参见 Pig Cookbook-并行功能

See Pig Cookbook - Parallel Features for more information

这篇关于将输出存储到单个CSV吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆