在Pandas groupby上使用value_counts时,如何忽略空序列? [英] How can I ignore empty series when using value_counts on a Pandas groupby?

查看:340
本文介绍了在Pandas groupby上使用value_counts时,如何忽略空序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,每行都有一个报纸文章的元数据.我想将这些分组为每月的块,然后计算一列(称为type)的值:

I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called type):

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()

这对于年度组工作正常,但是当我尝试按月分组时失败:

This works fine with an annual group but fails when I try to group by month:

ValueError: operands could not be broadcast together with shape (141,) (139,)

我认为这是因为有些月份组中没有文章.如果我迭代这些组并在每个组上打印value_counts:

I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:

for name, group in monthly_articles:
    print(name, group["type"].value_counts())

我在2006年1月和2月的分组中得到空系列:

I get empty series in the groups for Jan and Feb of 2006:

2005-12-31 00:00:00 positive    1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative    6
positive    5
neutral     1
Name: type, dtype: int64
2006-04-30 00:00:00 negative    11
positive     6
neutral      3
Name: type, dtype: int64

使用value_counts()时如何忽略空白组?

How can I ignore the empty groups when using value_counts()?

我尝试dropna=False失败.我认为这与这个问题是相同的问题.

I've tried dropna=False without success. I think this is the same issue as this question.

推荐答案

您最好给我们数据样本.否则,很难指出问题所在.从您的代码段来看,几个月以来的type数据似乎为空.您可以在分组对象上使用apply函数,然后调用unstack函数.这是对我有用的示例代码,数据是随机生成的

You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type data for some months is null. You can use apply function on grouped objects and then call unstack function. Here is the sample code that works for me, and the data is randomly generated

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
Out[75]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11

如果有空值:

In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
    ...: 

In [77]: dfx
Out[77]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11

谢谢.

这篇关于在Pandas groupby上使用value_counts时,如何忽略空序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆