Python中Pivot和Transpose的结合 [英] Combination of Pivot and Transpose in Python

查看:53
本文介绍了Python中Pivot和Transpose的结合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些文本分析,并且有一个看起来像这样的数据

I am doing some text analysis, and have a data which kind of looks like this

**TABLE 1**
C1   C2          C3

A1  TEXT1   ANOTHER_TEXT1
A2  TEXT1   ANOTHER_TEXT1
B1  TEXT2   ANOTHER_TEXT1
B2  TEXT2   ANOTHER_TEXT1
B3  TEXT2   ANOTHER_TEXT1
D1  TEXT3   ANOTHER_TEXT2
D2  TEXT3   ANOTHER_TEXT2

我真正需要的是一个在 C2 上聚合的数据集,以及作为不同列的 C1 的内容.本质上,df.transpose 应该做什么.但问题是,如果我转置,它不会聚合 C2C3.

What i really need is a dataset, aggregated over C2, and also the contents of C1 as different columns. Essentially, what a df.transpose is supposed to do. But the problem is that if i transpose, it does not aggregate C2 and C3.

本质上,这就是我正在研究的结构

Essentially, this is the structure i am looking at

**TABLE 2**
 C1              C2    CT1  CT2  CT3

ANOTHER_TEXT1   TEXT1   A1   A2   NA
ANOTHER_TEXT1   TEXT2   B1   B2   B3
ANOTHER_TEXT2   TEXT3   D1   D2   NA

我正在尝试 df.pivot_table(index=['C2','C3'], aggfunc='count'),它给了我出现的次数,这是正确的(显示以下).

I am trying df.pivot_table(index=['C2','C3'], aggfunc='count'), which gives me the count of the occurances, as is correct (Shown Below).

**TABLE 3**
 C1              C2    CT1
ANOTHER_TEXT1   TEXT1   2
                TEXT2   3
ANOTHER_TEXT2   TEXT3   2

那么,我如何在我想要的结构中获得它(表 2)?有可能吗?

So, how do i get it in the structure i want (Table 2)? Is it at all possible?

如果没有,我有什么选择?例如,哪种结构最接近我想要的结构.

If not, what alternatives do i have? As in, which structure would be closest to the one i want.

推荐答案

您可以使用 cumcount 用于新列,然后通过 set_indexunstack,最后一个 add_prefix:

You can use cumcount for new columns, then reshape by set_index with unstack, last add_prefix:

df['g'] = df.groupby(['C2','C3']).cumcount() + 1
df = df.set_index(['C2','C3', 'g'])['C1'].unstack().add_prefix('CT').reset_index()
print (df)
      C2             C3 CT1 CT2   CT3
0  TEXT1  ANOTHER_TEXT1  A1  A5    A2
1  TEXT2  ANOTHER_TEXT1  B1  B2    B3
2  TEXT3  ANOTHER_TEXT2  D1  D2  None

groupby 的另一种解决方案,对于新列,使用 Series 构造函数:

Another solution with groupby, for new columns use Series constructor:

df = df.groupby(['C2','C3'])['C1'] \
       .apply(lambda x: pd.Series(x.values)) \
       .unstack() \
       .rename(columns=lambda x: 'CT{}'.format(x+1)) \
       .reset_index()
print (df)
      C2             C3 CT1 CT2   CT3
0  TEXT1  ANOTHER_TEXT1  A1  A5    A2
1  TEXT2  ANOTHER_TEXT1  B1  B2    B3
2  TEXT3  ANOTHER_TEXT2  D1  D2  None

这篇关于Python中Pivot和Transpose的结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆