大 pandas :将多个类别组合为一个 [英] pandas: Combining Multiple Categories into One

查看:192
本文介绍了大 pandas :将多个类别组合为一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有1到10个类别,我想将red分配给值3到5,将green分配给1,6和7,将blue分配给2、8、9和10.

Let's say I have categories, 1 to 10, and I want to assign red to value 3 to 5, green to 1,6, and 7, and blue to 2, 8, 9, and 10.

我该怎么做?如果我尝试

How would I do this? If I try

df.cat.rename_categories(['red','green','blue'])

我得到一个错误:ValueError: new categories need to have the same number of items than the old categories!,但是如果我把它放进去

I get an error: ValueError: new categories need to have the same number of items than the old categories! but if I put this in

df.cat.rename_categories(['green','blue','red', 'red', 'red'
                        'green', 'green', 'blue', 'blue' 'blue'])

我将收到一条错误消息,指出存在重复的值.

I'll get an error saying that there are duplicate values.

我唯一想到的另一种方法是编写一个for循环,该循环将遍历值的字典并替换它们.解决这个问题是否更优雅?

The only other method I can think of is to write a for loop that'll go through a dictionary of the values and replace them. Is there a more elegant of resolving this?

推荐答案

不确定优雅性,但是如果您将旧类别转换为新类别,则类似(请注意添加的紫色"):

Not sure about elegance, but if you make a dict of the old to new categories, something like (note the added 'purple'):

>>> m = {"red": [3,4,5], "green": [1,6,7], "blue": [2,8,9,10], "purple": [11]}
>>> m2 = {v: k for k,vv in m.items() for v in vv}
>>> m2
{1: 'green', 2: 'blue', 3: 'red', 4: 'red', 5: 'red', 6: 'green', 
 7: 'green', 8: 'blue', 9: 'blue', 10: 'blue', 11: 'purple'}

您可以使用它来构建新的分类系列:

You can use this to build a new categorical Series:

>>> df.cat.map(m2).astype("category", categories=set(m2.values()))
0    green
1     blue
2      red
3      red
4      red
5    green
6    green
7     blue
8     blue
9     blue
Name: cat, dtype: category
Categories (4, object): [green, purple, red, blue]

如果您确定将在列中看到所有分类值,则不需要categories=set(m2.values())(或关心有序分类的有序等效项).但是在这里,如果我们不这样做,就不会在结果分类中看到purple,因为它是根据实际看到的类别构建的.

You don't need the categories=set(m2.values()) (or an ordered equivalent if you care about the categorical ordering) if you're sure that all categorical values will be seen in the column. But here, if we didn't do that, we wouldn't have seen purple in the resulting Categorical, because it was building it from the categories it actually saw.

当然,如果您已经建立了列表['green','blue','red', etc.],则使用它直接创建新的分类列并完全绕过此映射也很容易.

Of course if you already have your list ['green','blue','red', etc.] built it's equally easy just to use it to make a new categorical column directly and bypass this mapping entirely.

这篇关于大 pandas :将多个类别组合为一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆