通过Python从.idx3-ubyte文件或GZIP中提取图像 [英] Extract images from .idx3-ubyte file or GZIP via Python
问题描述
我使用OpenCV的facerecognizer创建了一个简单的人脸识别功能.它可以很好地处理人的图像.
I have created a simple function for facerecognition by using the facerecognizer from OpenCV. It works all fine with images from people.
现在,我想通过使用手写字符而不是人来进行测试.我遇到了MNIST数据集,但它们将图像存储在一个我从未见过的怪异文件中.
Now I would like to make a test by using handwritten characters instead of people. I came across MNIST dataset, but they store images in a weird file which I have never seen before.
我只需要从以下几张图像中提取
I simply need to extract a few images from:
train-images.idx3-ubyte
并将它们保存为.gif
还是我误解了MNIST的问题.如果是,我在哪里可以得到这样的数据集?
Or am I missunderstand this MNIST thing. If yes where could I get such a dataset?
编辑
我也有gzip文件:
train-images-idx3-ubyte.gz
我正在尝试阅读内容,但是show()
不起作用,如果我read()
我看到随机符号.
I am trying to read the content, but show()
does not work and if I read()
I see random symbols.
images = gzip.open("train-images-idx3-ubyte.gz", 'rb')
print images.read()
编辑
使用以下方法设法获得一些有用的输出:
Managed to get some usefull output by using:
with gzip.open('train-images-idx3-ubyte.gz','r') as fin:
for line in fin:
print('got line', line)
某种程度上,我现在必须将其转换为图像,输出:
Somehow I have to convert this now to an image, output:
推荐答案
下载培训/测试图像和标签:
Download the training/test images and labels:
- train-images-idx3-ubyte.gz:训练集图像
- train-labels-idx1-ubyte.gz:训练集标签
- t10k-images-idx3-ubyte.gz:测试集图像
- t10k-labels-idx1-ubyte.gz:测试集标签
然后将其解压缩到工作目录中,例如samples/
.
And uncompress them in a workdir, say samples/
.
从PyPi获取 python-mnist 包:
pip install python-mnist
导入mnist
软件包并阅读训练/测试图像:
Import the mnist
package and read the training/test images:
from mnist import MNIST
mndata = MNIST('samples')
images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()
要将图像显示到控制台:
To display an image to the console:
index = random.randrange(0, len(images)) # choose an index ;-)
print(mndata.display(images[index]))
您会得到这样的东西:
............................
............................
............................
............................
............................
.................@@.........
..............@@@@@.........
............@@@@............
..........@@................
..........@.................
...........@................
...........@................
...........@...@............
...........@@@@@.@..........
...........@@@...@@.........
...........@@.....@.........
..................@.........
..................@@........
..................@@........
..................@.........
.................@@.........
...........@.....@..........
...........@....@@..........
............@@@@............
.............@..............
............................
............................
............................
说明:
-
images 列表中的每个
- image 是一个无符号字节的Python
list
. - 标签是无符号字节的Python
array
.
- Each image of the images list is a Python
list
of unsigned bytes. - The labels is an Python
array
of unsigned bytes.
这篇关于通过Python从.idx3-ubyte文件或GZIP中提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!