将输出存储到单个 CSV? [英] STORE output to a single CSV?
问题描述
目前,当我存储到 HDFS 时,它会创建许多零件文件.
Currently, when I STORE into HDFS, it creates many part files.
有什么办法可以存储到单个 CSV 文件中?
Is there any way to store out to a single CSV file?
推荐答案
您可以通过以下几种方式做到这一点:
You can do this in a few ways:
要为所有 Pig 操作设置 reducer 的数量,您可以使用
default_parallel
属性 - 但这意味着每一步都将使用一个 reducer,从而降低吞吐量:
To set the number of reducers for all Pig opeations, you can use the
default_parallel
property - but this means every single step will use a single reducer, decreasing throughput:
set default_parallel 1;
在调用 STORE 之前,如果执行的操作之一是(COGROUP、CROSS、DISTINCT、GROUP、JOIN(内部)、JOIN(外部)和 ORDER BY),那么您可以使用 PARALLEL 1
关键字表示使用单个 reducer 来完成该命令:
Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1
keyword to denote the use of a single reducer to complete that command:
GROUP a BY grp PARALLEL 1;
请参阅Pig Cookbook - 并行功能了解更多信息
这篇关于将输出存储到单个 CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!