tfrecord 文件的最佳大小 [英] optimal size of a tfrecord file

查看:36
本文介绍了tfrecord 文件的最佳大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据您的经验,最适合在各种设备(硬盘、ssd、nvme)和存储位置(本地计算机、具有网络安装的 HPC 集群)上运行的 .tfrecord 文件的理想大小是多少?

From your experience, what would be an ideal size of a .tfrecord file that would work best across a wide variety of devices (hard-disk, ssd, nvme) and storage locations (local machine, hpc cluster with network mounts) ?

如果我在云中技术更强大的计算机上的性能比在本地 PC 上的性能更慢,那么 tfrecord 数据集的大小是否会成为瓶颈的根本原因?

In case I get slower performance on a technically more powerful computer in the cloud than on my local PC, could the size of the tfrecord dataset be the root cause of the bottleneck ?

谢谢

推荐答案

Tensorflow 官方网站推荐 ~100MB (https://docs.w3cub.com/tensorflow~guide/performance/performance_guide/)

Official Tensorflow website recommends ~100MB (https://docs.w3cub.com/tensorflow~guide/performance/performance_guide/)

读取大量小文件会显着影响 I/O表现.获得最大 I/O 吞吐量的一种方法是将输入数据预处理为更大(~100MB)的 TFRecord 文件.对于较小的数据集(200MB-1GB),最好的方法往往是加载整个数据集到内存中.

Reading large numbers of small files significantly impacts I/O performance. One approach to get maximum I/O throughput is to preprocess input data into larger (~100MB) TFRecord files. For smaller data sets (200MB-1GB), the best approach is often to load the entire data set into memory.

这篇关于tfrecord 文件的最佳大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆