pandas.crosstab中缺少数据 [英] Missing data in pandas.crosstab

查看:144
本文介绍了pandas.crosstab中缺少数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在和熊猫做一些交叉表:

I'm making some crosstabs with pandas:

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

但是我真正想要的是以下内容:

But what I actually want is the following:

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

我找到了解决方法,方法是添加新的列并将级别设置为新的MultiIndex,但这似乎很困难...

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult...

是否可以将MultiIndex传递给交叉表函数以预定义输出列?

Is there any way to pass MultiIndex to crosstabs function to predefine output columns?

推荐答案

我认为没有办法做到这一点,并且crosstab在源代码中调用pivot_table,但似乎没有提供此功能任何一个. 我提出了一个问题,此处.

I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. I raised it as an issue here.

一种骇人的解决方法(可能与您正在使用的相同...):

A hacky workaround (which may or may not be the same as you were already using...):

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

如果product太慢,则是它的一个numpy实现.

If product is too slow, here is a numpy implementation of it.

这篇关于pandas.crosstab中缺少数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆