使用tensorflow_datasets API访问已下载的数据集 [英] Accessing already downloaded dataset with tensorflow_datasets API

查看:55
本文介绍了使用tensorflow_datasets API访问已下载的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用最近发布的tensorflow_dataset API来在开放图像数据集上训练Keras模型.该数据集的大小约为570 GB.我使用以下代码下载了数据:

I am trying to work with the quite recently published tensorflow_dataset API to train a Keras model on the Open Images Dataset. The dataset is about 570 GB in size. I downloaded the data with the following code:

import tensorflow_datasets as tfds
import tensorflow as tf

open_images_dataset = tfds.image.OpenImagesV4()
open_images_dataset.download_and_prepare(download_dir="/notebooks/dataset/")

下载完成后,与我的jupyter笔记本电脑的连接以某种方式中断了,但提取似乎也已完成,至少所有下载的文件在提取的"文件夹中都有对应的文件.但是,我现在无法访问下载的数据:

After the download was complete, the connection to my jupyter notebook somehow interrupted but the extraction seemed to be finished as well, at least all downloaded files had a counterpart in the "extracted" folder. However, I am not able to access the downloaded data now:

tfds.load(name="open_images_v4", data_dir="/notebooks/open_images_dataset/extracted/", download=False)

这只会产生以下错误:

AssertionError: Dataset open_images_v4: could not find data in /notebooks/open_images_dataset/extracted/. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.

当我调用函数download_and_prepare()时,它只会再次下载整个数据集.

When I call the function download_and_prepare() it only downloads the whole dataset again.

我在这里想念东西吗?

下载后,解压缩"下的文件夹包含18个.tar.gz文件.

After the download the folder under "extracted" has 18 .tar.gz files.

推荐答案

这是带有tensorflow数据集1.0.1和tensorflow 2.0的.

This is with tensorflow-datasets 1.0.1 and tensorflow 2.0.

文件夹层次结构应如下所示:

The folder hierarchy should be like this:

/notebooks/open_images_dataset/extracted/open_images_v4/0.1.0

/notebooks/open_images_dataset/extracted/open_images_v4/0.1.0

所有数据集都有一个版本.然后可以像这样加载数据.

All the datasets have a version. Then the data could be loaded like this.

ds = tf.load('open_images_v4', data_dir='/notebooks/open_images_dataset/extracted', download=False)

我没有open_images_v4数据.我将cifar10数据放入名为open_images_v4的文件夹中,以检查tensorflow_datasets期望的文件夹结构.

I didn't have open_images_v4 data. I put cifar10 data into a folder named open_images_v4 to check what folder structure tensorflow_datasets was expecting.

这篇关于使用tensorflow_datasets API访问已下载的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆