在培训期间或使用Tensorboard时,Tensorflow是否持续轮询S3文件系统? [英] Is Tensorflow continuously polling a S3 filesystem during training or using Tensorboard?

查看:88
本文介绍了在培训期间或使用Tensorboard时,Tensorflow是否持续轮询S3文件系统?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在我的本地计算机上使用tensorboard来读取S3上的tensorflow日志.一切正常,但是tensorboard不断向控制台抛出以下错误.根据,原因是当Tensorflow s3客户端检查目录是否存在,因为s3无法检查目录是否存在,所以首先对其运行Stat.然后,它会检查是否存在具有该名称的密钥,并失败并显示错误消息.

I'm trying to use tensorboard on my local machine to read tensorflow logs on S3. Everything works but tensorboard continuously throws the following errors to the console. According to this the reason is that when Tensorflow s3 client checks if directory exists it firstly run Stat on it since s3 have no possibility to check whether directory exists. Then it checks if key with such name exists and fails with such error messages.

虽然这可能是模型服务寻找更新模型的通缉行为,并且可以使用 file_system_poll_wait_second 停止,但我不知道如何停止训练.实际上,如果在S3中保存检查点并登录,则在训练过程中也会发生同样的情况.抑制这些错误以提高日志级别不是选项,因为Tensorflow仍然连续轮询S3,您将为这些无用的请求付费.

While this could be a wanted behavior for model serving to look for updated models and can be stopped using file_system_poll_wait_second, I don't know how to stop it for training. In fact the same happens during training if you save checkpoints and logs in S3. Suppressing these errors increasing the log level is not an option because Tensorflow still continuously polls S3 and you pay for these useless requests.

I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:02.502274: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 404
Exception name: 
Error message: No response body.
6 response headers:
connection : close
content-type : application/xml
date : Mon, 23 Nov 2020 10:41:01 GMT
server : AmazonS3
x-amz-id-2 : ...
x-amz-request-id : ...
2020-11-23 11:41:02.502364: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2020-11-23 11:41:02.502699: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:03.327409: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:03.491773: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 404

有什么主意吗?

推荐答案

我错了.TF只是将日志写入S3,而错误与链接的问题有关,这是正常现象.额外成本极低,因为AWS不会向您收取相同区域内服务之间的数据传输费用,而只是向运营收取费用.使用带有S3的张量板也是如此.对于那些对这些主题感兴趣的人,我在这里

I was wrong. TF just write logs to S3 and while the errors are related to the linked issue, this is the normal behavior. Extra costs are minimal because AWS doesn't charge you for data transfer between services in the same region, but only for the operations. The same apply using tensorboard with S3. For anyone interested in these topics, I made a repository here

这篇关于在培训期间或使用Tensorboard时,Tensorflow是否持续轮询S3文件系统?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆