每个GroupedDataSet的Apache Flink输出到csv文件 [英] Apache Flink output to csv file for each GroupedDataSet
问题描述
我想将每个groupedDataSet输出到csv.
I want to output to csv every groupedDataSet.
数据示例:
A,123
B,200
A,400
B,400
所以我想要的输出是:
文件1:
A,123
A,400
文件2:
B,200
B,400
所以基本上是exampleData
的简单代码:
So basically a simple code for exampleData
:
exampleData.groupBy(0).sortGroup(1, Order.ASCENDING)
现在,我想将每个groupedDataSet输出到不同的CSV.实现此目的的最佳做法是什么?
Now I want to output each groupedDataSet to a different CSV. What is the best practice to achieve this?
我正在使用Scala 2.11.12版和Flink 1.11.0版
I'm using Scala version 2.11.12, and Flink version 1.11.0
推荐答案
您需要的是一个存储分区,但是目前仅支持流作业,而不是批量处理. Flink 1.12具有统一的批处理&流式传输,因此从理论上讲可能适合您.我为批处理作业实现了自己的存储接收器,但是对于最新版本的Hadoop似乎存在一些问题,需要对其进行调试.
What you need is a bucketing sink, but that's currently only supported for streaming jobs, not batch. Flink 1.12 has unified batch & streaming, so in theory that might work for you. I implemented my own bucketing sink for batch jobs, but it seems to have some issues with recent versions of Hadoop, which I need to debug.
这篇关于每个GroupedDataSet的Apache Flink输出到csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!