在 pandas 中旋转时遇到麻烦(在R中传播) [英] Trouble pivoting in pandas (spread in R)

查看:71
本文介绍了在 pandas 中旋转时遇到麻烦(在R中传播)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas中的pd.pivot()或ivot_table()函数遇到一些问题.

I'm having some issues with the pd.pivot() or pivot_table() functions in pandas.

我有这个:

df = pd.DataFrame({'site_id': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c', 5:
 'c',6: 'a', 7: 'a', 8: 'b', 9: 'b', 10: 'c', 11: 'c'},
                   'dt': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1,6: 2, 7: 2, 8: 2, 9: 2, 10: 2, 11: 2},
                   'eu': {0: 'FGE', 1: 'WSH', 2: 'FGE', 3: 'WSH', 4: 'FGE', 5: 'WSH',6: 'FGE', 7: 'WSH', 8: 'FGE', 9: 'WSH', 10: 'FGE', 11: 'WSH'},
                   'kw': {0: '8', 1: '5', 2: '3', 3: '7', 4: '1', 5: '5',6: '2', 7: '3', 8: '5', 9: '7', 10: '2', 11: '5'}})


df
Out[140]: 
    dt   eu kw site_id
0    1  FGE  8       a
1    1  WSH  5       a
2    1  FGE  3       b
3    1  WSH  7       b
4    1  FGE  1       c
5    1  WSH  5       c
6    2  FGE  2       a
7    2  WSH  3       a
8    2  FGE  5       b
9    2  WSH  7       b
10   2  FGE  2       c
11   2  WSH  5       c

我想要这个:

dt   site_id   FGE   WSH
 1         a     8     5
 1         b     3     7
 1         c     1     5
 2         a     2     3
 2         b     5     7
 2         c     2     5

我已经尝试了一切!

df.pivot_table(index = ['site_id','dt'], values = 'kw', columns = 'eu')

df.pivot(index = ['site_id','dt'], values = 'kw', columns = 'eu')

应该起作用了.我也尝试过unstack():

should have worked. I also tried unstack():

df.set_index(['dt','site_id','eu']).unstack(level = -1)

推荐答案

您最近一次尝试(使用unstack)对我来说效果很好,我不确定为什么会给您带来麻烦. FWIW,我认为使用索引名称而不是级别更容易阅读,所以我这样做是这样的:

Your last try (with unstack) works fine for me, I'm not sure why it gave you a problem. FWIW, I think it's more readable to use the index names rather than levels, so I did it like this:

>>> df.set_index(['dt','site_id','eu']).unstack('eu')

            kw    
eu         FGE WSH
dt site_id        
1  a         8   5
   b         3   7
   c         1   5
2  a         2   3
   b         5   7
   c         2   5

但是,再次,您的方式对我来说看起来不错,与@piRSquared所做的几乎一样(除了他们的答案添加了更多代码以摆脱多重索引).

But again, your way looks fine to me and is pretty much the same as what @piRSquared did (except their answer adds some more code to get rid of the multi-index).

我认为pivot的问题在于您只能传递单个变量,而不能传递列表?无论如何,这对我有用:

I think the problem with pivot is that you can only pass a single variable, not a list? Anyway, this works for me:

>>> df.set_index(['dt','site_id']).pivot(columns='eu')

对于pivot_table,主要问题是'kw'是对象/字符,默认情况下pivot_table会尝试与numpy.mean进行聚合.您可能会收到错误消息:"DataError:没有要聚合的数字类型".

For pivot_table, the main issue is that 'kw' is an object/character and pivot_table will attempt to aggregate with numpy.mean by default. You probably got the error message: "DataError: No numeric types to aggregate".

但是有两种解决方法.首先,您可以将其转换为数字类型,然后使用相同的pivot_table命令

But there are a couple of workarounds. First, you could just convert to a numeric type and then use your same pivot_table command

>>> df['kw'] = df['kw'].astype(int)
>>> df.pivot_table(index = ['dt','site_id'], values = 'kw', columns = 'eu')

或者,您可以更改聚合函数:

Alternatively you could change the aggregation function:

>>> df.pivot_table(index = ['dt','site_id'], values = 'kw', columns = 'eu', 
                   aggfunc=sum )

这是利用了一个事实,即即使您不能取其平均值,也可以对字符串进行求和(包含).实际上,您可以在此处使用大多数对字符串进行操作的函数(包括lambda).

That's using the fact that strings can be summed (concatentated) even though you can't take a mean of them. Really, you can use most functions here (including lambdas) that operate on strings.

但是请注意,即使每个单元格只有一个值,pivot_table's aggfunc在这里也需要某种归约运算,因此实际上没有任何要归约的内容!但是代码中需要进行归约操作,因此您必须执行一次检查.

Note, however, that pivot_table's aggfunc requires some sort of reduction operation here even though you only have a single value per cell, so there actually isn't anything to reduce! But there is a check in the code that requires a reduction operation, so you have to do one.

这篇关于在 pandas 中旋转时遇到麻烦(在R中传播)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆