EOFException与Flink上Beam管道运行期间的内存段有关 [英] EOFException related to memory segments during run of Beam pipeline on Flink
问题描述
我正在尝试在我们的测试集群上的Flink上运行Apache Beam管道.它在整个作业日志.一些值已被伪数据取代.
I'm trying to run an Apache Beam pipeline on Flink on our test cluster. It has been failing with an EOFException
at org.apache.flink.runtime.io.disk.SimpleCollectingOutputView:79
during the encoding of an object through serialisation. I haven't been able to reproduce the error locally, yet. You can find the entire job log here. Some values have been replaced with fake data.
用于运行管道的命令:
bin/flink run \
-m yarn-cluster \
--yarncontainer 1 \
--yarnslots 4 \
--yarnjobManagerMemory 2000 \
--yarntaskManagerMemory 2000 \
--yarnname "EBI" \
pipeline.jar \
--runner=FlinkRunner \
--zookeeperQuorum=hdp-master-001.fake.org:2181
虽然我认为这不相关,但是要序列化的对象是可序列化的,并且具有隐式和显式编码器,但这并不影响情况.
While I think it's not related, the object-to-be-serialised is serialisable and has had both an implicit and an explicit coder, but this doesn't affect the situation.
什么可能导致这种情况,我该怎么解决?
What might be causing this situation and what can I do to address it?
就目前而言,将管理器的堆内存增加到4到8GiB之间似乎可以防止该异常.仍然不确定这是否应该是正常的Flink行为(它不会溢出到磁盘上吗?).似乎不是可以扩展的解决方案.
推荐答案
由于Flink用尽了内存缓冲区,因此抛出了EOFException
. Flink希望EOFException
作为通知开始将数据写入磁盘.
The EOFException
is thrown because Flink ran out of memory buffers. Flink expects an EOFException
as a notification to start to write data to disk.
此问题是由Beam的SerializableCoder
将EOFException
包装在CoderException
中引起的.因此,Flink无法捕获预期的EOFException
并失败.
This problem is caused by Beam's SerializableCoder
wraps the EOFException
in a CoderException
. Hence, Flink does not catch the expected EOFException
and fails.
可以使用不包装EOFException
但将其转发的自定义编码器解决该问题.
The problem can be solved by using a custom coder that does not wrap the EOFException
but forwards it.
这篇关于EOFException与Flink上Beam管道运行期间的内存段有关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!