每个tfrecord中的示例数 [英] Number of examples in each tfrecord

查看:139
本文介绍了每个tfrecord中的示例数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照flowers示例的步骤,在Google Cloud Shell中运行sample.sh脚本,以对图像集进行以下预处理.

Running the sample.sh script in Google Cloud Shell to call the below preprocess on set of images following the steps of flowers example.

https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/flowers/trainer/preprocess.py

在评估集和训练集上,预处理均成功完成.但是生成的.tfrecord.gz文件似乎与eval/train_set.csv中的图像编号不匹配.

Preprocess was successfully on both eval set and train set. But the generated .tfrecord.gz files does not seem matching the image numbers in eval/train_set.csv.

即eval-00000-of-00157.tfrecord.gz表示有158个tfrecord,而eval_set.csv中有35227行.每条记录均包含有效的image_url(均已上传到存储空间),每条记录均带有有效标签.

i.e. eval-00000-of-00157.tfrecord.gz says there are 158 tfrecord while there are 35227 rows in eval_set.csv. Each record include a valid image_url (all of them are uploaded to Storage), each record has valid label tagged.

想知道是否有一种方法可以监视和控制preproces.py配置中每个tfrecord的图像数量.

Would like to know if there is a way to monitor and control the number of images per tfrecord in preproces.py config.

谢谢

更新,正确完成此工作:

Update, got this work out right:

import tensorflow as tf 
import os
from tensorflow.python.lib.io import file_io

options = tf.python_io.TFRecordOptions(
    compression_type=tf.python_io.TFRecordCompressionType.GZIP)

sum(1 for f in file_io.get_matching_files(os.path.join(url/path, '*.tfrecord.gz'))
    for example in tf.python_io.tf_record_iterator(f, options=options))

推荐答案

文件名eval-00000-of-00157.tfrecord.gz表示这是158个文件中的第一个文件.应该有157个类似名称的文件.每个文件中可以有任意数量的记录.

The filename eval-00000-of-00157.tfrecord.gz means that this is the first file out of 158. There should be 157 similarly named files. Within each file, there can be any number of records.

如果要手动计算每条记录,请尝试以下操作:

If you want to manually count each record, try something like:

import tensorflow as tf
from tensorflow.python.lib.io import file_io

files = os.path.join('gs://my_bucket/my_dir', 'eval-*.tfrecord.gz')
print(sum(1 for f in tf.python_io.file_io.get_matching_files(files)
            for tf.python_io.tf_record_iterator(f)))

请注意,Dataflow无法保证文件数量与输入文件和输出文件之间的记录顺序(文件间和文件内)之间的关系.但是,计数应该相同.

Note that there is no guarantee from Dataflow as to the relationship between the number of files and ordering of records (inter- and intra-file) between input files and output files. However, the counts should be the same.

这篇关于每个tfrecord中的示例数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆