如何在sklearn中使用datasets.fetch_mldata()? [英] How to use datasets.fetch_mldata() in sklearn?

查看:641
本文介绍了如何在sklearn中使用datasets.fetch_mldata()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为简单的机器学习算法运行以下代码:

I am trying to run the following code for a brief machine learning algorithm:

import re
import argparse
import csv
from collections import Counter
from sklearn import datasets
import sklearn
from sklearn.datasets import fetch_mldata

dataDict = datasets.fetch_mldata('MNIST Original')

在这段代码中,我试图通过sklearn读取mldata.org上存在的数据集"MNIST Original".这将导致以下错误(有更多行代码,但在这一特定行中我遇到了错误):

In this piece of code, I am trying to read the dataset 'MNIST Original' present at mldata.org via sklearn. This results in the following error(there are more lines of code but I am getting error at this particular line):

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.3\helpers\pydev\pydevd.py", line 1481, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.3\helpers\pydev\pydevd.py", line 1124, in run
    pydev_imports.execfile(file, globals, locals) #execute the script
  File "C:/Users/sony/PycharmProjects/Machine_Learning_Homework1/zeroR.py", line 131, in <module>
    dataDict = datasets.fetch_mldata('MNIST Original')
  File "C:\Anaconda\lib\site-packages\sklearn\datasets\mldata.py", line 157, in fetch_mldata
    matlab_dict = io.loadmat(matlab_file, struct_as_record=True)
  File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 176, in loadmat
    matfile_dict = MR.get_variables(variable_names)
  File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", line 294, in get_variables
    res = self.read_var_array(hdr, process)
  File "C:\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py", line 257, in read_var_array
    return self._matrix_reader.array_from_header(header, process)
  File "mio5_utils.pyx", line 624, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:5717)
  File "mio5_utils.pyx", line 653, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:5147)
  File "mio5_utils.pyx", line 721, in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex (scipy\io\matlab\mio5_utils.c:6134)
  File "mio5_utils.pyx", line 424, in scipy.io.matlab.mio5_utils.VarReader5.read_numeric (scipy\io\matlab\mio5_utils.c:3704)
  File "mio5_utils.pyx", line 360, in scipy.io.matlab.mio5_utils.VarReader5.read_element (scipy\io\matlab\mio5_utils.c:3429)
  File "streams.pyx", line 181, in scipy.io.matlab.streams.FileStream.read_string (scipy\io\matlab\streams.c:2711)
IOError: could not read bytes

我尝试过在互联网上进行研究,但几乎没有任何帮助.与解决此错误有关的任何专家帮助将不胜感激.

I have tried researching on internet but there is hardly any help available. Any expert help related to solving this error will be much appreciated.

TIA.

推荐答案

从0.20版开始,sklearn fetch_openml 代替.

As of version 0.20, sklearn deprecates fetch_mldata function and adds fetch_openml instead.

使用以下代码下载 MNIST数据集:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')

虽然格式有所更改.例如,mnist['target']是字符串类别标签的数组(不像以前那样浮动).

There are some changes to the format though. For instance, mnist['target'] is an array of string category labels (not floats as before).

这篇关于如何在sklearn中使用datasets.fetch_mldata()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆