同一列内所有可能的排列列Pandas Dataframe [英] All possible permutations columns Pandas Dataframe within the same column
问题描述
我在使用Postgres SQL时也遇到了类似的问题,但我认为在Postgres中确实很难完成这种任务,而且我认为python/pandas会使此操作变得容易得多,尽管我仍然不太愿意解决方案.
I had a similar question using Postgres SQL, but I figured that this kind of task is really hard to do in Postgres, and I think python/pandas would make this a lot easier, although I still can't quite come up with the solution.
我现在有一个如下所示的Pandas Dataframe:
I now have a Pandas Dataframe which looks like this:
df={'planid' : ['A', 'A', 'B', 'B', 'C', 'C'],
'x' : ['a1', 'a2', 'b1', 'b2', 'c1', 'c2']}
df=pd.DataFrame(df)
df
planid x
0 A a1
1 A a2
2 B b1
3 B b2
4 C c1
5 C c2
我想获得Planid不相等的所有可能排列.换句话说,将Planid中的每个值都视为一个桶",如果要从每个值的x
中提取值,我希望所有可能的组合
planid
中的存储桶".在此特定示例中,总共有8个置换{{a1,b1,c1),(a1,b2,c1),(a1,b1,c2),(a1,b2,c2),(a2,b1,c1) ,(a2,b2,c1),(a2,b1,c2),(a2,b2,c2)}.
I want to get all possible permutations where planid are not equal to each other. In other words, think of each value in planid as a "bucket" and I want all possible combinations if I were to draw values from x
from each
"bucket" in planid
. In this particular example, there are 8 total permutations {(a1, b1, c1), (a1, b2, c1), (a1, b1, c2), (a1, b2, c2), (a2, b1, c1), (a2, b2, c1), (a2, b1, c2), (a2, b2, c2)}.
但是,我希望得到的数据框为三列,分别为planid
,x
和另一列,也许命名为permutation_counter
.最终数据帧具有用permutation_counter
标记的所有不同排列.换句话说,我希望我的最终数据框看起来像
However, I want my resulting data frame to be three columns, planid
, x
and another column, perhaps named permutation_counter
. The final data frame has all the different permutations labeled with permutation_counter
. In other words, I want my final dataframe to look like
planid x permutation_counter
0 A a1 1
1 B b1 1
2 C c1 1
3 A a1 2
4 B b2 2
5 C c1 2
6 A a1 3
7 B b1 3
8 C c2 3
9 A a1 4
10 B b2 4
11 C c2 4
12 A a2 5
13 B b1 5
14 C c1 5
15 A a2 6
16 B b2 6
17 C c1 6
18 A a2 7
19 B b1 7
20 C c2 7
21 A a2 8
22 B b2 8
23 C c2 8
任何帮助将不胜感激!
推荐答案
我试图将尽可能多的步骤链接在一起.分解它们以查看每个步骤的作用:)
I was trying to chain as many steps together as possible. Break them down to see what each step does :)
df2 = pd.DataFrame(index=pd.MultiIndex.from_product([subdf['x'] for p, subdf in df.groupby('planid')], names=df.planid.unique())).reset_index().stack().reset_index()
df2.columns = ['permutation_counter', 'planid', 'x']
df2['permutation_counter'] += 1
print df2[['planid', 'x', 'permutation_counter']]
planid x permutation_counter
0 A a1 1
1 B b1 1
2 C c1 1
3 A a1 2
4 B b1 2
5 C c2 2
6 A a1 3
7 B b2 3
8 C c1 3
9 A a1 4
10 B b2 4
11 C c2 4
12 A a2 5
13 B b1 5
14 C c1 5
15 A a2 6
16 B b1 6
17 C c2 6
18 A a2 7
19 B b2 7
20 C c1 7
21 A a2 8
22 B b2 8
23 C c2 8
这篇关于同一列内所有可能的排列列Pandas Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!