从 csv 文件读取图像并返回 tf.data.Dataset 对象的有效方法 [英] Effective way to read images from a csv file and return a tf.data.Dataset object
问题描述
我有一个包含两列的 csv 文件:
I have a csv file that contains two columns:
- 存储为
numpy
数组的图像的文件路径 - 图片标签
csv 中的每一行对应一个项目(样本).
Each row in the csv corresponds to one item (sample).
我想创建一个 tf.data
管道,它读取文件路径并加载 numpy 数组和与之关联的标签.我该怎么做才能返回 tf.data.Dataset
对象?
I want to create a tf.data
pipeline that reads the file path and loads the numpy array and the label associated with it. How would I go about doing so so that I can return a tf.data.Dataset
object?
网站上的文档 信息量不大,我无法理解从哪里开始.
The documentation on the website is not very informative and I cannot figure out where to start from.
推荐答案
一种方法是将这 2 个文件加载到变量中并使用 tf.data.Dataset.from_tensor_slices
(参见
One way to do this is simply load those 2 files into variables and use tf.data.Dataset.from_tensor_slices
(see https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays)
另一种方法是将文件路径映射到数据集并进行数据流水线读取并将其返回为 (img, label)这是来自 https://www.tensorflow.org/tutorials/load_data/images<的示例代码/a>
Another way is to map the file path into dataset and do data pipelining to read and return it as (img, label) Here is the sample code from https://www.tensorflow.org/tutorials/load_data/images
def load_and_preprocess_image(path):
image = tf.read_file(path)
return preprocess_image(image)
ds = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
# The tuples are unpacked into the positional arguments of the mapped function
def load_and_preprocess_from_path_label(path, label):
return load_and_preprocess_image(path), label
image_label_ds = ds.map(load_and_preprocess_from_path_label)
如果数据对于内存来说太大,我自己更喜欢第二种方式,但第一种方式对于小数据来说很方便
Myself would prefer the second way if the data is too big for the memory, but the first one is handy for small data
这篇关于从 csv 文件读取图像并返回 tf.data.Dataset 对象的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!