如何将文件名数据集映射到文件内容数据集 [英] How to map a dataset of filenames to a dataset of file contents

查看:24
本文介绍了如何将文件名数据集映射到文件内容数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我有一个 tensorflow 数据集,其中每个元素都是一个 tf.string Tensor 表示图像文件的文件名.现在我想将此文件名数据集映射到图像内容张量的数据集.

For example, I have a tensorflow dataset where each element is a tf.string Tensor represents a filename of an image file. Now I want to map this filename dataset to a dataset of image content Tensors.

我写了这样的代码,但它不起作用,因为map函数不能急切地执行.(引发错误,指出 Tensor 类型没有名为 numpy 的属性.)

I wrote code like this, but it doesn't work because map function can't execute eagerly. (Raises an error saying Tensor type has no attribute named numpy.)

def parseline(line):
    filename = line.numpy()
    image = some_library.open_image(filename).to_numpy()
    return image

dataset = dataset.map(parseline)

推荐答案

基本上可以通过以下方式来完成:

Basically, it can be done the following way:

path = 'path_to_images'

files = [os.path.join(path, i) for i in os.listdir(path)] # If you need to create a list of filenames, because tf functions require tensors

def parse_image(filename):
    file = tf.io.read_file(filename) # this will work only with filename as tensor
    image = tf.image.decode_image(f)
    return img

dataset = tf.data.Dataset.from_tensor_slices(files)
dataset = dataset.map(parse_image).batch(1)

如果您处于急切模式,只需迭代数据集

if you're in eager mode just iterate over dataset

 for i in dataset:           
    print(i)

如果没有,你需要一个迭代器

If not, you'll need an iterator

iterator = dataset.make_one_shot_iterator()
with tf.Session as sess:
    sess.run(iterator.get_next())

这篇关于如何将文件名数据集映射到文件内容数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆