Python中Pivot和Transpose的结合 [英] Combination of Pivot and Transpose in Python
问题描述
我正在做一些文本分析,并且有一个看起来像这样的数据
I am doing some text analysis, and have a data which kind of looks like this
**TABLE 1**
C1 C2 C3
A1 TEXT1 ANOTHER_TEXT1
A2 TEXT1 ANOTHER_TEXT1
B1 TEXT2 ANOTHER_TEXT1
B2 TEXT2 ANOTHER_TEXT1
B3 TEXT2 ANOTHER_TEXT1
D1 TEXT3 ANOTHER_TEXT2
D2 TEXT3 ANOTHER_TEXT2
我真正需要的是一个在 C2
上聚合的数据集,以及作为不同列的 C1
的内容.本质上,df.transpose
应该做什么.但问题是,如果我转置,它不会聚合 C2
和 C3
.
What i really need is a dataset, aggregated over C2
, and also the contents of C1
as different columns. Essentially, what a df.transpose
is supposed to do.
But the problem is that if i transpose, it does not aggregate C2
and C3
.
本质上,这就是我正在研究的结构
Essentially, this is the structure i am looking at
**TABLE 2**
C1 C2 CT1 CT2 CT3
ANOTHER_TEXT1 TEXT1 A1 A2 NA
ANOTHER_TEXT1 TEXT2 B1 B2 B3
ANOTHER_TEXT2 TEXT3 D1 D2 NA
我正在尝试 df.pivot_table(index=['C2','C3'], aggfunc='count')
,它给了我出现的次数,这是正确的(显示以下).
I am trying df.pivot_table(index=['C2','C3'], aggfunc='count')
, which gives me the count of the occurances, as is correct (Shown Below).
**TABLE 3**
C1 C2 CT1
ANOTHER_TEXT1 TEXT1 2
TEXT2 3
ANOTHER_TEXT2 TEXT3 2
那么,我如何在我想要的结构中获得它(表 2)?有可能吗?
So, how do i get it in the structure i want (Table 2)? Is it at all possible?
如果没有,我有什么选择?例如,哪种结构最接近我想要的结构.
If not, what alternatives do i have? As in, which structure would be closest to the one i want.
推荐答案
您可以使用 cumcount
用于新列,然后通过 set_index
和 unstack
,最后一个 add_prefix
:
You can use cumcount
for new columns, then reshape by set_index
with unstack
, last add_prefix
:
df['g'] = df.groupby(['C2','C3']).cumcount() + 1
df = df.set_index(['C2','C3', 'g'])['C1'].unstack().add_prefix('CT').reset_index()
print (df)
C2 C3 CT1 CT2 CT3
0 TEXT1 ANOTHER_TEXT1 A1 A5 A2
1 TEXT2 ANOTHER_TEXT1 B1 B2 B3
2 TEXT3 ANOTHER_TEXT2 D1 D2 None
groupby
的另一种解决方案,对于新列,使用 Series
构造函数:
Another solution with groupby
, for new columns use Series
constructor:
df = df.groupby(['C2','C3'])['C1'] \
.apply(lambda x: pd.Series(x.values)) \
.unstack() \
.rename(columns=lambda x: 'CT{}'.format(x+1)) \
.reset_index()
print (df)
C2 C3 CT1 CT2 CT3
0 TEXT1 ANOTHER_TEXT1 A1 A5 A2
1 TEXT2 ANOTHER_TEXT1 B1 B2 B3
2 TEXT3 ANOTHER_TEXT2 D1 D2 None
这篇关于Python中Pivot和Transpose的结合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!