TFrecords 比原始 JPEG 图像占用更多空间 [英] TFrecords occupy more space than original JPEG images

查看:27
本文介绍了TFrecords 比原始 JPEG 图像占用更多空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将我的 Jpeg 图像集转换为 TFrecords.但是 TFrecord 文件占用的空间几乎是图像集的 5 倍.经过大量的谷歌搜索,我了解到当 JPEG 被写入 TFrecords 时,它们不再是 JPEG.但是,我还没有遇到针对此问题的可理解的代码解决方案.请告诉我应该在下面的代码中进行哪些更改才能将 JPEG 写入 Tfrecords.

I'm trying to convert my Jpeg image set into to TFrecords. But TFrecord file is taking almost 5x more space than the image set. After a lot of googling, I learned that when JPEG are written into TFrecords, they aren't JPEG anymore. However I haven't come across an understandable code solution to this problem. Please tell me what changes ought to be made in the code below to write JPEG to Tfrecords.

def print_progress(count, total):
    pct_complete = float(count) / total
    msg = "\r- Progress: {0:.1%}".format(pct_complete)
    sys.stdout.write(msg)
    sys.stdout.flush()

def wrap_int64(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def wrap_bytes(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def convert(image_paths , labels, out_path):
    # Args:
    # image_paths   List of file-paths for the images.
    # labels        Class-labels for the images.
    # out_path      File-path for the TFRecords output file.

    print("Converting: " + out_path)

    # Number of images. Used when printing the progress.
    num_images = len(image_paths)

    # Open a TFRecordWriter for the output-file.
    with tf.python_io.TFRecordWriter(out_path) as writer:

        # Iterate over all the image-paths and class-labels.
        for i, (path, label) in enumerate(zip(image_paths, labels)):
            # Print the percentage-progress.
            print_progress(count=i, total=num_images-1)

            # Load the image-file using matplotlib's imread function.
            img = imread(path)
            # Convert the image to raw bytes.
            img_bytes = img.tostring()

            # Create a dict with the data we want to save in the
            # TFRecords file. You can add more relevant data here.
            data = \
            {
                'image': wrap_bytes(img_bytes),
                'label': wrap_int64(label)
            }

            # Wrap the data as TensorFlow Features.
            feature = tf.train.Features(feature=data)

            # Wrap again as a TensorFlow Example.
            example = tf.train.Example(features=feature)

            # Serialize the data.
            serialized = example.SerializeToString()

            # Write the serialized data to the TFRecords file.
            writer.write(serialized)

有人可以回答这个吗?!!

Can someone please answer this ?!!

推荐答案

我们可以使用内置的 open 函数来获取字节,而不是将图像转换为数组并返回字节.这样,压缩后的图像就会被写入 TFRecord.

Instead of converting image to array and back to bytes, we can just use inbuilt open function to get the bytes. That way, compressed image will be written into TFRecord.

替换这两行

img = imread(path)
img_bytes = img.tostring()

img_bytes = open(path,'rb').read()

参考:

https://github.com/tensorflow/tensorflow/issues/9675

这篇关于TFrecords 比原始 JPEG 图像占用更多空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆