将输出存储到单个CSV吗? [英] STORE output to a single CSV?
问题描述
当前,当我存储到HDFS中时,它会创建许多零件文件.
Currently, when I STORE into HDFS, it creates many part files.
有什么方法可以存储到单个CSV文件中?
Is there any way to store out to a single CSV file?
推荐答案
您可以通过以下几种方式执行此操作:
You can do this in a few ways:
-
要为所有Pig操作设置减速器数量,可以使用
default_parallel
属性-但这意味着每个步骤都将使用单个减速器,从而降低了吞吐量:
To set the number of reducers for all Pig opeations, you can use the
default_parallel
property - but this means every single step will use a single reducer, decreasing throughput:
set default_parallel 1;
在调用STORE之前,如果执行的操作之一是(COGROUP,CROSS,DISTINCT,GROUP,JOIN(内部),JOIN(外部)和ORDER BY),则可以使用PARALLEL 1
关键字,表示使用单个reducer来完成该命令:
Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1
keyword to denote the use of a single reducer to complete that command:
GROUP a BY grp PARALLEL 1;
有关更多信息,请参见 Pig Cookbook-并行功能
See Pig Cookbook - Parallel Features for more information
这篇关于将输出存储到单个CSV吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!