通过Python从.idx3-ubyte文件或GZIP中提取图像 [英] Extract images from .idx3-ubyte file or GZIP via Python

查看:1773
本文介绍了通过Python从.idx3-ubyte文件或GZIP中提取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用OpenCV的facerecognizer创建了一个简单的人脸识别功能.它可以很好地处理人的图像.

I have created a simple function for facerecognition by using the facerecognizer from OpenCV. It works all fine with images from people.

现在,我想通过使用手写字符而不是人来进行测试.我遇到了MNIST数据集,但它们将图像存储在一个我从未见过的怪异文件中.

Now I would like to make a test by using handwritten characters instead of people. I came across MNIST dataset, but they store images in a weird file which I have never seen before.

我只需要从以下几张图像中提取

I simply need to extract a few images from:

train-images.idx3-ubyte

并将它们保存为.gif

还是我误解了MNIST的问题.如果是,我在哪里可以得到这样的数据集?

Or am I missunderstand this MNIST thing. If yes where could I get such a dataset?

编辑

我也有gzip文件:

train-images-idx3-ubyte.gz

我正在尝试阅读内容,但是show()不起作用,如果我read()我看到随机符号.

I am trying to read the content, but show() does not work and if I read() I see random symbols.

images = gzip.open("train-images-idx3-ubyte.gz", 'rb')
print images.read()

编辑

使用以下方法设法获得一些有用的输出:

Managed to get some usefull output by using:

with gzip.open('train-images-idx3-ubyte.gz','r') as fin:
    for line in fin:
        print('got line', line)

某种程度上,我现在必须将其转换为图像,输出:

Somehow I have to convert this now to an image, output:

推荐答案

下载培训/测试图像和标签:

Download the training/test images and labels:

  • train-images-idx3-ubyte.gz:训练集图像
  • train-labels-idx1-ubyte.gz:训练集标签
  • t10k-images-idx3-ubyte.gz:测试集图像
  • t10k-labels-idx1-ubyte.gz:测试集标签

然后将其解压缩到工作目录中,例如samples/.

And uncompress them in a workdir, say samples/.

从PyPi获取 python-mnist 包:

pip install python-mnist

导入mnist软件包并阅读训练/测试图像:

Import the mnist package and read the training/test images:

from mnist import MNIST

mndata = MNIST('samples')

images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()

要将图像显示到控制台:

To display an image to the console:

index = random.randrange(0, len(images))  # choose an index ;-)
print(mndata.display(images[index]))

您会得到这样的东西:

............................
............................
............................
............................
............................
.................@@.........
..............@@@@@.........
............@@@@............
..........@@................
..........@.................
...........@................
...........@................
...........@...@............
...........@@@@@.@..........
...........@@@...@@.........
...........@@.....@.........
..................@.........
..................@@........
..................@@........
..................@.........
.................@@.........
...........@.....@..........
...........@....@@..........
............@@@@............
.............@..............
............................
............................
............................

说明:

    images 列表中的每个
  • image 是一个无符号字节的Python list.
  • 标签是无符号字节的Python array.
  • Each image of the images list is a Python list of unsigned bytes.
  • The labels is an Python array of unsigned bytes.

这篇关于通过Python从.idx3-ubyte文件或GZIP中提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆