tfrecord 文件的最佳大小 [英] optimal size of a tfrecord file
问题描述
根据您的经验,最适合在各种设备(硬盘、ssd、nvme)和存储位置(本地计算机、具有网络安装的 HPC 集群)上运行的 .tfrecord 文件的理想大小是多少?
From your experience, what would be an ideal size of a .tfrecord file that would work best across a wide variety of devices (hard-disk, ssd, nvme) and storage locations (local machine, hpc cluster with network mounts) ?
如果我在云中技术更强大的计算机上的性能比在本地 PC 上的性能更慢,那么 tfrecord 数据集的大小是否会成为瓶颈的根本原因?
In case I get slower performance on a technically more powerful computer in the cloud than on my local PC, could the size of the tfrecord dataset be the root cause of the bottleneck ?
谢谢
推荐答案
Tensorflow 官方网站推荐 ~100MB (https://docs.w3cub.com/tensorflow~guide/performance/performance_guide/)
Official Tensorflow website recommends ~100MB (https://docs.w3cub.com/tensorflow~guide/performance/performance_guide/)
读取大量小文件会显着影响 I/O表现.获得最大 I/O 吞吐量的一种方法是将输入数据预处理为更大(~100MB)的 TFRecord 文件.对于较小的数据集(200MB-1GB),最好的方法往往是加载整个数据集到内存中.
Reading large numbers of small files significantly impacts I/O performance. One approach to get maximum I/O throughput is to preprocess input data into larger (~100MB) TFRecord files. For smaller data sets (200MB-1GB), the best approach is often to load the entire data set into memory.
这篇关于tfrecord 文件的最佳大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!