完成 GeneratorDataset 迭代器时出错:已取消:操作已取消 [英] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
问题描述
在运行具有使用 tensorflow 2.0 的代码的 kubeflow 管道时.以下错误显示在每个时期的末尾
While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch
W tensorflow/core/kernels/data/generator_dataset_op.cc:103] 最终确定 GeneratorDataset 迭代器时发生错误:已取消:操作已取消
W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
此外,经过一些时期后,它不显示日志并显示此错误
Also, after some epochs, it does not show log and shows this error
此步骤处于失败状态并显示以下消息:节点资源不足:内存.容器 main 使用了 100213872Ki,超过了它的 0 请求.容器等待使用了 25056Ki,超过了它的 0 请求.
This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.
推荐答案
这是由于 CUDA 和 Tensorflow 版本不兼容造成的.以下版本可以很好地相互配合
This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other
tensorflow-gpu==2.0.0
tensorflow-gpu==2.0.0
tensorflow-addons==0.6.0
tensorflow-addons==0.6.0
nvidia/cuda:10.0-cudnn7-runtime
nvidia/cuda:10.0-cudnn7-runtime
这篇关于完成 GeneratorDataset 迭代器时出错:已取消:操作已取消的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!