sklearn 没有属性“数据集" [英] sklearn doesn't have attribute 'datasets'
问题描述
我已经开始在我的工作中使用 sckikit-learn.所以我正在阅读 tutorial ,它提供了加载一些数据集的标准程序:
I have started using sckikit-learn for my work. So I was going through the tutorial which gives standard procedure to load some datasets:
$ python
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
但是,为了方便起见,我尝试通过以下方式加载数据:
However, for my convenience, I tried loading the data in the following way:
In [1]: import sklearn
In [2]: iris = sklearn.datasets.load_iris()
然而,这会引发以下错误:
However, this throws following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-db77d2036db5> in <module>()
----> 1 iris = sklearn.datasets.load_iris()
AttributeError: 'module' object has no attribute 'datasets'
但是,如果我使用明显相似的方法:
However, if I use the apparently similar method:
In [3]: from sklearn import datasets
In [4]: iris = datasets.load_iris()
它可以正常工作.事实上,以下也有效:
It works without problem. In fact the following also works:
In [5]: iris = sklearn.datasets.load_iris()
我对此完全困惑.我错过了一些非常微不足道的东西吗?这两种方法有什么区别?
I am completely confused about this. Am I missing something very trivial? What is the difference between the two approaches?
推荐答案
sklearn
is a package. This answer said it very succinctly:
当你导入一个包时,只有该包的__init__.py
文件中的变量/函数/类是直接可见的,而不是子包或模块.
when you import a package, only variables/functions/classes in the
__init__.py
file of that package are directly visible, not sub-packages or modules.
datasets
是 sklearn
的子包.这就是发生这种情况的原因:
datasets
is a sub-package of sklearn
. This is why this happens:
In [1]: import sklearn
In [2]: sklearn.datasets
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-325a2bfc35d0> in <module>()
----> 1 sklearn.datasets
AttributeError: module 'sklearn' has no attribute 'datasets'
然而,这样做的原因:
In [3]: from sklearn import datasets
In [4]: sklearn.datasets
Out[4]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
是当您通过执行from sklearn import datasets
加载子包datasets
时,它会自动添加到包sklearn
的命名空间中>.这是鲜为人知的 Python 导入系统的陷阱".
is that when you load the sub-package datasets
by doing from sklearn import datasets
it is automatically added to the namespace of the package sklearn
. This is one of the lesser-known "traps" of the Python import system.
另外,请注意,如果您查看 __init__.py
对于 sklearn
你将看到 'datasets'
作为 __all__
,但这只允许你做:
Also, note that if you look at the __init__.py
for sklearn
you will see 'datasets'
as a member of __all__
, but this only allows you to do:
In [1]: from sklearn import *
In [2]: datasets
Out[2]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
最后一点要注意的是,如果你检查 sklearn
或 datasets
你会发现,虽然它们是包,但它们的类型是 module代码>.这是因为所有包都被视为模块 - 然而,并非所有模块都是包.
One last point to note is that if you inspect either sklearn
or datasets
you will see that, although they are packages, their type is module
. This is because all packages are considered modules - however, not all modules are packages.
这篇关于sklearn 没有属性“数据集"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!