如何“通过传递一个类别参数来明确地指定类别顺序".当使用元组作为 pandas 的索引键时? [英] How to "explicitly specify the categories order by passing in a categories argument" when using tuples as index keys in pandas?
问题描述
我一直在尝试弄清楚如何在pandas
中使这些元组索引键,但是出现错误.
I've been trying to figure out how to make these tuples index keys in pandas
but I'm getting an error.
如何使用下面带有pd.Categorical
的错误中的建议来解决此错误?
How can I use the suggestion from the error with pd.Categorical
below to fix this error?
我知道我可以转换为字符串,但是我很想知道错误消息中的建议是什么意思?
I am aware that I can convert to a string but I am curious to see what is meant by the suggestion in the error message?
当我用0.22.0
运行它时,它工作得很好.如果有人想要查看正确的信息,我已经为此打开了 GitHub问题 0.22.0
的输出.
This works perfectly fine when I run it with 0.22.0
. I've opened a GitHub issue for this if anyone wants to see the proper output from 0.22.0
.
我想更新我的熊猫并适当处理此问题.
I want to update my pandas and handle this problem appropriately.
import sys; sys.version
# '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]'
import pandas as pd; pd.__version__
# '0.23.4'
index = [(('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 8))]
len(index)
# 40
pd.Index(index)
Traceback (most recent call last):
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/algorithms.py", line 635, in factorize
order = uniques.argsort()
TypeError: '<' not supported between instances of 'NoneType' and 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 451, in safe_sort
sorter = values.argsort()
TypeError: '<' not supported between instances of 'NoneType' and 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 345, in __init__
codes, categories = factorize(values, sort=True)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/algorithms.py", line 643, in factorize
assume_unique=True)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 455, in safe_sort
ordered = sort_mixed(values)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 441, in sort_mixed
nums = np.sort(values[~str_pos])
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 847, in sort
a.sort(axis=axis, kind=kind, order=order)
TypeError: '<' not supported between instances of 'NoneType' and 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 449, in __new__
data, names=name or kwargs.get('names'))
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1330, in from_tuples
return MultiIndex.from_arrays(arrays, sortorder=sortorder, names=names)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1274, in from_arrays
labels, levels = _factorize_from_iterables(arrays)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2543, in _factorize_from_iterables
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2543, in <listcomp>
return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2515, in _factorize_from_iterable
cat = Categorical(values, ordered=True)
File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 351, in __init__
raise TypeError("'values' is not ordered, please "
TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument
推荐答案
我希望错误消息能提供更多信息.由于上述答案,我得以找出问题所在.我最终完成了与两个版本兼容的操作:
I wish the error message was a little more informative. Thanks to the above answers I was able to figure out the issue. I ended up doing this which is compatible with both versions:
>>> pd.__version__
'0.23.4'
>>> index_categorical = pd.Index([*map(frozenset, index)], dtype="category")
>>> dict(index_categorical[0])
{'criterion': 'gini', 'max_features': 'log2', 'min_samples_leaf': 1}
pandas v0.22.0
>>> pd.__version__
'0.22.0'
>>> index_categorical = pd.Index([*map(frozenset, index)], dtype="category")
>>> dict(index_categorical[0])
{'min_samples_leaf': 1, 'criterion': 'gini', 'max_features': 'log2'}
这篇关于如何“通过传递一个类别参数来明确地指定类别顺序".当使用元组作为 pandas 的索引键时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!