不可排序的类型:str()< int() [英] unorderable types: str() < int()
问题描述
我当时正在设计一个使用pandas,numpy和sklearn的基本垃圾邮件分类程序( python 3 ),但出现此错误并且无法确定位置.我试图查看不同变量的数据类型,但没有找到位置. (火腿=不是垃圾邮件).输入文件对此错误不执行任何操作,因为它与python 2.7一起使用 它的软件包/模块兼容性或数据类型转换错误.
i was designing a basic spam classifier program (python 3) using pandas, numpy and sklearn but i am getting this error and not able to identify where. I tried to see datatypes of different variables but didn't find the location. (ham = not spam). Input files has to do nothing with this error, as its working with python 2.7 Either its packages/modules compatibility or data type casting error.
import os
import io
import numpy
from pandas import DataFrame
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
def readFiles(path):
for root, dirnames, filenames in os.walk(path):
for filename in filenames:
path = os.path.join(root, filename)
inBody = False
lines = []
f = io.open(path, 'r', encoding='latin1')
for line in f:
if inBody:
lines.append(line)
elif line == '\n':
inBody = True
f.close()
message = '\n'.join(lines)
yield path, message
def dataFrameFromDirectory(path, classification):
rows = []
index = []
for filename, message in readFiles(path):
rows.append({'message': message, 'class': classification})
index.append(filename)
return DataFrame(rows, index=index)
data = DataFrame({'message': [], 'class': []})
data = data.append(dataFrameFromDirectory('D:/emails/spam', 'spam'))
data = data.append(dataFrameFromDirectory('D:/emails/ham', 'ham'))
来自ipython笔记本的堆栈跟踪:
Stack Trace from ipython NoteBook:
TypeError Traceback (most recent call last)
<ipython-input-5-555887356cc2> in <module>()
3 import numpy
4 from pandas import DataFrame
----> 5 from sklearn.feature_extraction.text import CountVectorizer
6 from sklearn.naive_bayes import MultinomialNB
7
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\__init__.py in <module>()
55 else:
56 from . import __check_build
---> 57 from .base import clone
58 __check_build # avoid flakes unused variable error
59
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\base.py in <module>()
10 from scipy import sparse
11 from .externals import six
---> 12 from .utils.fixes import signature
13 from .utils.deprecation import deprecated
14 from .exceptions import ChangedBehaviorWarning as _ChangedBehaviorWarning
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\__init__.py in <module>()
9
10 from .murmurhash import murmurhash3_32
---> 11 from .validation import (as_float_array,
12 assert_all_finite,
13 check_random_state, column_or_1d, check_array,
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\validation.py in <module>()
16
17 from ..externals import six
---> 18 from ..utils.fixes import signature
19 from .deprecation import deprecated
20 from ..exceptions import DataConversionWarning as _DataConversionWarning
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\sklearn\utils\fixes.py in <module>()
404
405
--> 406 if np_version < (1, 12, 0):
407 class MaskedArray(np.ma.MaskedArray):
408 # Before numpy 1.12, np.ma.MaskedArray object is not picklable
TypeError: unorderable types: str() < int()
推荐答案
与我的版本集有关-相对较新,但不是最先进的:
With my collection of versions - relatively recent, but not cutting edge:
In [509]: import sklearn
In [510]: sklearn.__version__
Out[510]: '0.17'
In [511]: np.__version__
Out[511]: '1.11.2'
In [512]: sklearn.utils.fixes._parse_version(np.__version__)
Out[512]: (1, 11, 2)
In [513]: sklearn.utils.fixes._parse_version(np.__version__)<(1,12,0)
Out[513]: True
最后一步是将一个从np.__version__
字符串派生的元组与另一个元组进行比较.
The last step is comparing one tuple, derived from the np.__version__
string, with another.
我建议尽可能导入和打印:
I'd suggest importing, and printing, to the extent possible:
np.__version__
scipy.__version__
sys.version
scklearn.__version__
在Andras链接之后,问题是numpy
版本号.如果numpy是新的Beta版本,则版本号的0b1
部分将出现此测试问题.
Following Andras link, the problem is the numpy
version number. If numpy is a new beta version, the 0b1
part of the version number gives this test problems.
In [517]: sklearn.utils.fixes._parse_version('1.12.0b1')<(1,12,0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-517-a2d159f6d08a> in <module>()
----> 1 sklearn.utils.fixes._parse_version('1.12.0b1')<(1,12,0)
TypeError: unorderable types: str() < int()
如果可能的话,最简单的解决方案是返回常规的numpy版本(类似于"1.11.2"),而不是beta版本.
The simplest solution, if possible, to go back to a regular numpy release (something like '1.11.2'), rather than a beta.
如果这确实是一个numpy/sklearn版本问题,则OP的否定票是不公平的.
The negative votes for the OP are unfair if this is indeed a numpy/sklearn version issue.
这篇关于不可排序的类型:str()< int()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!