如何“通过传递一个类别参数来明确地指定类别顺序".当使用元组作为 pandas 的索引键时? [英] How to "explicitly specify the categories order by passing in a categories argument" when using tuples as index keys in pandas?

查看:64
本文介绍了如何“通过传递一个类别参数来明确地指定类别顺序".当使用元组作为 pandas 的索引键时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试弄清楚如何在pandas中使这些元组索引键,但是出现错误.

I've been trying to figure out how to make these tuples index keys in pandas but I'm getting an error.

如何使用下面带有pd.Categorical的错误中的建议来解决此错误?

How can I use the suggestion from the error with pd.Categorical below to fix this error?

我知道我可以转换为字符串,但是我很想知道错误消息中的建议是什么意思?

I am aware that I can convert to a string but I am curious to see what is meant by the suggestion in the error message?

当我用0.22.0运行它时,它工作得很好.如果有人想要查看正确的信息,我已经为此打开了 GitHub问题 0.22.0的输出.

This works perfectly fine when I run it with 0.22.0. I've opened a GitHub issue for this if anyone wants to see the proper output from 0.22.0.

我想更新我的熊猫并适当处理此问题.

I want to update my pandas and handle this problem appropriately.

import sys; sys.version
# '3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]'
import pandas as pd; pd.__version__
# '0.23.4'
index = [(('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 'log2'), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 'sqrt'), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', None), ('min_samples_leaf', 8)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 1)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 2)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 3)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 5)), (('criterion', 'gini'), ('max_features', 0.382), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 'log2'), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 'sqrt'), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', None), ('min_samples_leaf', 8)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 1)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 2)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 3)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 5)), (('criterion', 'entropy'), ('max_features', 0.382), ('min_samples_leaf', 8))]
len(index)
# 40
pd.Index(index)
Traceback (most recent call last):
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/algorithms.py", line 635, in factorize
    order = uniques.argsort()
TypeError: '<' not supported between instances of 'NoneType' and 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 451, in safe_sort
    sorter = values.argsort()
TypeError: '<' not supported between instances of 'NoneType' and 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/algorithms.py", line 643, in factorize
    assume_unique=True)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 455, in safe_sort
    ordered = sort_mixed(values)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/sorting.py", line 441, in sort_mixed
    nums = np.sort(values[~str_pos])
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 847, in sort
    a.sort(axis=axis, kind=kind, order=order)
TypeError: '<' not supported between instances of 'NoneType' and 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 449, in __new__
    data, names=name or kwargs.get('names'))
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1330, in from_tuples
    return MultiIndex.from_arrays(arrays, sortorder=sortorder, names=names)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 1274, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2543, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2543, in <listcomp>
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2515, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/Users/jespinoz/anaconda/envs/py3_testing/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 351, in __init__
    raise TypeError("'values' is not ordered, please "
TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument

推荐答案

我希望错误消息能提供更多信息.由于上述答案,我得以找出问题所在.我最终完成了与两个版本兼容的操作:

I wish the error message was a little more informative. Thanks to the above answers I was able to figure out the issue. I ended up doing this which is compatible with both versions:

>>> pd.__version__
'0.23.4'
>>> index_categorical = pd.Index([*map(frozenset, index)], dtype="category")
>>> dict(index_categorical[0])
{'criterion': 'gini', 'max_features': 'log2', 'min_samples_leaf': 1}

pandas v0.22.0

>>> pd.__version__
'0.22.0'
>>> index_categorical = pd.Index([*map(frozenset, index)], dtype="category")
>>> dict(index_categorical[0])
{'min_samples_leaf': 1, 'criterion': 'gini', 'max_features': 'log2'}

这篇关于如何“通过传递一个类别参数来明确地指定类别顺序".当使用元组作为 pandas 的索引键时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆