确定 DataFrame 中相同元素的任意组合出现的次数 [英] Identify number of occurrence of any combination of equal elements in DataFrame

查看:51
本文介绍了确定 DataFrame 中相同元素的任意组合出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模拟数据框,df1,有 6 列和 5 行,即形状为 (5 x 6).

I have a mock dataframe, df1 with 6 columns and 5 rows, i.e., with shape (5 x 6).

每列代表一个地区的价格,行代表时间.

Each column represents the price of an area, and rows are time.

   DK1   DK2   NO1   NO2   NO3   NO4
0  10    10    12    15    15    10
1  15    10    15    10    10    10
2  15    15    15    15    15    15
3  10    10    12    15    15    10
4  10    10    10    10    15    15

现在,我想确定 - 在每一行中 - 价格与第一列DK1"相同的区域,然后能够总结出任何列组合相等的频率,这意味着我想要的输出这将是:

Now, I want to identify - in each row - the areas with the same price as the first column "DK1", and then being able to sum up how often any combination of columns are equal, meaning that my desired output for this would be:

       Combo                  Occurrence
0   DK1-DK2-NO4                   2
1   DK1-DK2-NO1-NO2-NO3-NO4       1
2   DK1-NO1                       1
3   DK1-DK2-NO1-NO2               1

最好,该解决方案应该适用于任何大小的 DataFrame.我开始尝试使用 .apply() 方法,但无法完全开始.希望能帮到你.

Preferably, the solution should be applicable to any sized DataFrame. I started trying with the .apply() method, but couldn't quite get started. Hope you can help.

推荐答案

Compare DataFrame by first column by DataFrame.eq,然后使用矩阵乘法 DataFrame.dot 带有带有分隔符的列名和带有 Series.value_counts并转换为DataFrame:

Compare DataFrame by first column by DataFrame.eq, then use matrix multiplication by DataFrame.dot with columns names with separator and last count with Series.value_counts and convert to DataFrame:

df = (df.eq(df['DK1'], axis=0)
        .dot(df.columns + ',')
        .str[:-1]
        .value_counts()
        .rename_axis('Combo')
        .reset_index(name='Occurrence'))
print (df)
                     Combo  Occurrence
0              DK1,DK2,NO4           2
1                  DK1,NO1           1
2          DK1,DK2,NO1,NO2           1
3  DK1,DK2,NO1,NO2,NO3,NO4           1

对于组可以通过所有值创建字典,然后调用replace:

For groups is possible create dictionary by all values and then call replace:

s = df.columns.to_series()
s.index = s.index.str.replace('\d+','', regex=True)

d = s.groupby(level=0).agg(','.join).to_dict()
d = {v:k for k, v in d.items()}
print (d)
{'DK1,DK2': 'DK', 'NO1,NO2,NO3,NO4': 'NO'}

df = (df.eq(df['DK1'], axis=0)
        .dot(df.columns + ',')
        .str[:-1]
        .value_counts()
        .rename_axis('Combo')
        .reset_index(name='Occurrence'))

df['Combo'] = df['Combo'].replace(d, regex=True)
print (df)
        Combo  Occurrence
0      DK,NO4           2
1     DK1,NO1           1
2  DK,NO1,NO2           1
3       DK,NO           1

这篇关于确定 DataFrame 中相同元素的任意组合出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆