Python数据框:在同一列上旋转 [英] Python dataframe: pivot on same column
问题描述
我有两列 ID和 division,如下所示。
df = pd.DataFrame(np.array([[''111','AAA'],['222', 'AAA'],['333','BBB'],['444','CCC'],['444','AAA'],['222','BBB'],['111', 'BBB']]),columns = ['ID','division'])
ID划分
0111 AAA
1222 AAA
2333 BBB
3444 CCC
4444 AAA
5222 BBB
6111 BBB
期望的输出如下所示,在这里我需要在同一列上进行透视,但是计数取决于除法。
df = pd.DataFrame(np.array([['0','2 ','1','1'],['2','0','1','1'],['1','1','0','0'],['1 ','1','0','0']]))列= ['111','222','333','444'],index = ['111','222','333 ','444'])
111222333444
111 0 2 1 1
222 2 0 1 1
333 1 1 0 0
444 1 1 0 0
因此,从技术上讲,我在ID与除法之间存在重叠。 / p>
例如:
红色突出显示的框,其中111和222 ID之间的重叠为2(AAA和BBB)。其中111和444之间的重叠为1(AAA在黑框中突出显示)。
我可以分2步在excel中完成此操作。不确定下面是否有帮助。
Step1:= SUM(COUNTIFS($ B $ 2:$ B $ 8,$ B2,$ A $ 2:$ A $ 8,$ G2),COUNTIFS($ B $ 2:$ B $ 8, $ B2,$ A $ 2:$ A $ 8,H $ 1))-1
步骤2:= IF($ G12 = H $ 1,0,SUMIFS(H $ 2:H $ 8,$ G $ 2:$ G $ 8,$ G12))
但是有什么办法可以做到使用数据框的Python。
感谢您的帮助
案例2
如果df = pd.DataFrame(np.array([['111','AAA','4'],['222','AAA','5'],['333' ,'BBB','6'],
['444','CCC','3'],['444','AAA','2'],['222','BBB' ,'2'],
['111','BBB','7']]),columns = ['ID','division','count'])
ID分区计数
0111 AAA 4
1222 AAA 5
2333 BBB 6
3444 CCC 3
4444 AAA 2
5222 BBB 2
6111 BBB 7
预期产量为
df_result = pd.DataFrame(np.array([['0','18','13','6'],['18', '0','8','7'],['13','8','0','0'],['6','7','0','0']]) ,columns = ['111','222','333','444'],index = ['111','222','333','444'])
111 222333444
111 0 18 13 6
222 18 0 8 7
333 13 8 0 0
444 6 7 0 0
计算on:在这里,关于AAA和BBB,111和222之间有重叠,因此总和为4 + 5 + 2 + 7 = 18
另一种方法是使用带有 merge
和 pd.crosstab
的自联接:
df_out = df.merge(df,on ='division')
结果= pd .crosstab(df_out.ID_x,df_out.ID_y)
np.fill_diagonal(results.values,0)
输出:
ID_y 111222333444
ID_x
111 0.0 2.0 1.0 1.0
222 2.0 0.0 1.0 1.0
333 1.0 1.0 0.0 0.0
444 1.0 1.0 0.0 0.0
案例2
df = pd.DataFrame(np.array([['111','AAA' ,'4'],['222','AAA','5'],['333','BBB','6'],
['444','CCC','3' ],['444','AAA','2'],['222','BBB','2'],
['111','BBB','7']])), column = ['ID','division','count '])
df ['count'] = df ['count']。astype(int)
df_out = df.merge(df,on ='division')
df_out = df_out.assign(count = df_out.count_x + df_out.count_y)
结果= pd.crosstab(df_out.ID_x,df_out.ID_y,df_out ['count'], aggfunc ='sum')。fillna(0)
np.fill_diagonal(results.values,0)
输出:
ID_y 111222333444
ID_x
111 0.0 18.0 13.0 6.0
222 18.0 0.0 8.0 7.0
333 13.0 8.0 0.0 0.0
444 6.0 7.0 0.0 0.0
I have two columns "ID" and "division" as shown below.
df = pd.DataFrame(np.array([['111', 'AAA'],['222','AAA'],['333','BBB'],['444','CCC'],['444','AAA'],['222','BBB'],['111','BBB']]),columns=['ID','division'])
ID division
0 111 AAA
1 222 AAA
2 333 BBB
3 444 CCC
4 444 AAA
5 222 BBB
6 111 BBB
The expected output is as shown below where I need to pivot on the same column but the count is dependent on "division". This should be presented in a heatmap.
df = pd.DataFrame(np.array([['0','2','1','1'],['2','0','1','1'],['1','1','0','0'],['1','1','0','0']]),columns=['111','222','333','444'],index=['111','222','333','444'])
111 222 333 444
111 0 2 1 1
222 2 0 1 1
333 1 1 0 0
444 1 1 0 0
So, technically I am doing an overlap between ID's with respect to division.
Example: The highlighted box in red where the overlap between 111 and 222 ID's is 2(AAA and BBB). where as the overlap between 111 and 444 is 1 (AAA highlighted in the black box).
I could do this in excel in 2 steps.Not sure if below one helps.
Step1:=SUM(COUNTIFS($B$2:$B$8,$B2,$A$2:$A$8,$G2),COUNTIFS($B$2:$B$8,$B2,$A$2:$A$8,H$1))-1
Step2:=IF($G12=H$1,0,SUMIFS(H$2:H$8,$G$2:$G$8,$G12))
But is there any way that we can do it in Python using dataframes. Appreciate your help
Case-2
if df = pd.DataFrame(np.array([['111', 'AAA','4'],['222','AAA','5'],['333','BBB','6'],
['444','CCC','3'],['444','AAA','2'], ['222','BBB','2'],
['111','BBB','7']]),columns=['ID','division','count'])
ID division count
0 111 AAA 4
1 222 AAA 5
2 333 BBB 6
3 444 CCC 3
4 444 AAA 2
5 222 BBB 2
6 111 BBB 7
Expected output would be
df_result = pd.DataFrame(np.array([['0','18','13','6'],['18','0','8','7'],['13','8','0','0'],['6','7','0','0']]),columns=['111','222','333','444'],index=['111','222','333','444'])
111 222 333 444
111 0 18 13 6
222 18 0 8 7
333 13 8 0 0
444 6 7 0 0
Calculation: Here there is an overlap between 111 and 222 with respect to divisions AAA and BBB hence the sum would be 4+5+2+7=18
Another way to do this is to use a self join with merge
and pd.crosstab
:
df_out = df.merge(df, on='division')
results = pd.crosstab(df_out.ID_x, df_out.ID_y)
np.fill_diagonal(results.values, 0)
Output:
ID_y 111 222 333 444
ID_x
111 0.0 2.0 1.0 1.0
222 2.0 0.0 1.0 1.0
333 1.0 1.0 0.0 0.0
444 1.0 1.0 0.0 0.0
Case 2
df = pd.DataFrame(np.array([['111', 'AAA','4'],['222','AAA','5'],['333','BBB','6'],
['444','CCC','3'],['444','AAA','2'], ['222','BBB','2'],
['111','BBB','7']]),columns=['ID','division','count'])
df['count'] = df['count'].astype(int)
df_out = df.merge(df, on='division')
df_out = df_out.assign(count = df_out.count_x + df_out.count_y)
results = pd.crosstab(df_out.ID_x, df_out.ID_y, df_out['count'], aggfunc='sum').fillna(0)
np.fill_diagonal(results.values, 0)
Output:
ID_y 111 222 333 444
ID_x
111 0.0 18.0 13.0 6.0
222 18.0 0.0 8.0 7.0
333 13.0 8.0 0.0 0.0
444 6.0 7.0 0.0 0.0
这篇关于Python数据框:在同一列上旋转的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!