如何在输出方法期间在数据集中生成动态路径 [英] How to generate dynamic path in dataset during the output method
问题描述
有没有办法在 Flink 中创建动态 DataSink 输出路径?
Is there a way to create a dynamic DataSink output path in Flink?
DataSet 的数据类型为 Tuple2
DataSet has data type as Tuple2<String, String>
当我们尝试使用流时,我有一种使用自定义 Bucketer 生成动态浴的方法,如下所示
When we tried using stream I had a way to generate dynamic bath using custom Bucketer like below
@Override
public Path getBucketPath(Clock clock, Path basePath, Tuple2<String, String> element) {
return new Path(basePath + "/schema=" + element.f0.toLowerCase().trim() + "/");
}
我想知道是否有类似的方法可以在 DataSet 中处理以生成自定义路径.
I would like to know is there a similar way to handle in DataSet for generating the custom path.
推荐答案
我摸索了一下,没有找到与批处理类似的东西.这意味着我认为您必须创建自己的 OutputFormat
类,该类包装常规 FileOutputFormat
并使用相同的 Bucketer 接口进行分桶.
I poked around a bit, and didn't find anything similar for batch processing. Which means I think you'd have to create your own OutputFormat
class that wraps a regular FileOutputFormat
and does bucketing, using the same Bucketer interface.
这篇关于如何在输出方法期间在数据集中生成动态路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!