TFrecords 比原始 JPEG 图像占用更多空间 [英] TFrecords occupy more space than original JPEG images
问题描述
我正在尝试将我的 Jpeg 图像集转换为 TFrecords.但是 TFrecord 文件占用的空间几乎是图像集的 5 倍.经过大量的谷歌搜索,我了解到当 JPEG 被写入 TFrecords 时,它们不再是 JPEG.但是,我还没有遇到针对此问题的可理解的代码解决方案.请告诉我应该在下面的代码中进行哪些更改才能将 JPEG 写入 Tfrecords.
I'm trying to convert my Jpeg image set into to TFrecords. But TFrecord file is taking almost 5x more space than the image set. After a lot of googling, I learned that when JPEG are written into TFrecords, they aren't JPEG anymore. However I haven't come across an understandable code solution to this problem. Please tell me what changes ought to be made in the code below to write JPEG to Tfrecords.
def print_progress(count, total):
pct_complete = float(count) / total
msg = "\r- Progress: {0:.1%}".format(pct_complete)
sys.stdout.write(msg)
sys.stdout.flush()
def wrap_int64(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
def wrap_bytes(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def convert(image_paths , labels, out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path, label) in enumerate(zip(image_paths, labels)):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = imread(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes),
'label': wrap_int64(label)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
有人可以回答这个吗?!!
Can someone please answer this ?!!
推荐答案
我们可以使用内置的 open
函数来获取字节,而不是将图像转换为数组并返回字节.这样,压缩后的图像就会被写入 TFRecord.
Instead of converting image to array and back to bytes, we can just use inbuilt open
function to get the bytes. That way, compressed image will be written into TFRecord.
替换这两行
img = imread(path)
img_bytes = img.tostring()
与
img_bytes = open(path,'rb').read()
参考:
https://github.com/tensorflow/tensorflow/issues/9675
这篇关于TFrecords 比原始 JPEG 图像占用更多空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!