确定 DataFrame 中相同元素的任意组合出现的次数 [英] Identify number of occurrence of any combination of equal elements in DataFrame
问题描述
我有一个模拟数据框,df1,有 6 列和 5 行,即形状为 (5 x 6).
I have a mock dataframe, df1 with 6 columns and 5 rows, i.e., with shape (5 x 6).
每列代表一个地区的价格,行代表时间.
Each column represents the price of an area, and rows are time.
DK1 DK2 NO1 NO2 NO3 NO4
0 10 10 12 15 15 10
1 15 10 15 10 10 10
2 15 15 15 15 15 15
3 10 10 12 15 15 10
4 10 10 10 10 15 15
现在,我想确定 - 在每一行中 - 价格与第一列DK1"相同的区域,然后能够总结出任何列组合相等的频率,这意味着我想要的输出这将是:
Now, I want to identify - in each row - the areas with the same price as the first column "DK1", and then being able to sum up how often any combination of columns are equal, meaning that my desired output for this would be:
Combo Occurrence
0 DK1-DK2-NO4 2
1 DK1-DK2-NO1-NO2-NO3-NO4 1
2 DK1-NO1 1
3 DK1-DK2-NO1-NO2 1
最好,该解决方案应该适用于任何大小的 DataFrame.我开始尝试使用 .apply() 方法,但无法完全开始.希望能帮到你.
Preferably, the solution should be applicable to any sized DataFrame. I started trying with the .apply() method, but couldn't quite get started. Hope you can help.
推荐答案
Compare DataFrame by first column by DataFrame.eq
,然后使用矩阵乘法 DataFrame.dot
带有带有分隔符的列名和带有 Series.value_counts
并转换为DataFrame
:
Compare DataFrame by first column by DataFrame.eq
, then use matrix multiplication by DataFrame.dot
with columns names with separator and last count with Series.value_counts
and convert to DataFrame
:
df = (df.eq(df['DK1'], axis=0)
.dot(df.columns + ',')
.str[:-1]
.value_counts()
.rename_axis('Combo')
.reset_index(name='Occurrence'))
print (df)
Combo Occurrence
0 DK1,DK2,NO4 2
1 DK1,NO1 1
2 DK1,DK2,NO1,NO2 1
3 DK1,DK2,NO1,NO2,NO3,NO4 1
对于组可以通过所有值创建字典,然后调用replace
:
For groups is possible create dictionary by all values and then call replace
:
s = df.columns.to_series()
s.index = s.index.str.replace('\d+','', regex=True)
d = s.groupby(level=0).agg(','.join).to_dict()
d = {v:k for k, v in d.items()}
print (d)
{'DK1,DK2': 'DK', 'NO1,NO2,NO3,NO4': 'NO'}
df = (df.eq(df['DK1'], axis=0)
.dot(df.columns + ',')
.str[:-1]
.value_counts()
.rename_axis('Combo')
.reset_index(name='Occurrence'))
df['Combo'] = df['Combo'].replace(d, regex=True)
print (df)
Combo Occurrence
0 DK,NO4 2
1 DK1,NO1 1
2 DK,NO1,NO2 1
3 DK,NO 1
这篇关于确定 DataFrame 中相同元素的任意组合出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!