以pandas数据框中的相同字符串开头的列的总和 [英] sum values of columns starting with the same string in pandas dataframe

查看：204 发布时间：2020/5/24 0:40:51 python pandas dataframe startswith

本文介绍了以pandas数据框中的相同字符串开头的列的总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大约100列的数据框，如下所示:

I have a dataframe with about 100 columns that looks like this:

   Id  Economics-1  English-107  English-2  History-3  Economics-zz  Economics-2  \
0  56          1            1          0        1       0           0   
1  11          0            0          0        0       1           0   
2   6          0            0          1        0       0           1   
3  43          0            0          0        1       0           1   
4  14          0            1          0        0       1           0   

   Histo      Economics-51      Literature-re         Literatureu4  
0           1            0           1                0  
1           0            0           0                1  
2           0            0           0                0  
3           0            1           1                0  
4           1            0           0                0

我的目标是只保留全局类别-英语，历史，文学-并在此数据框中分别写入其组成部分的总和.例如，英语"将是英语107"和英语2"的总和:

My goal is to leave only global categories -- English, History, Literature -- and write the sum of the value of their components, respectively, in this dataframe. For instance, "English" would be the sum of "English-107" and "English-2":

    Id  Economics      English    History  Literature  
0  56          1            1          2        1                     
1  11          1            0          0        1                    
2   6          0            1          1        0                     
3  43          2            0          1        1                     
4  14          0            1          1        0

为此，我尝试了两种方法.第一种方法:

For this purpose, I have tried two methods. First method:

df = pd.read_csv(file_path, sep='\t')
df['History'] = df.loc[df[df.columns[pd.Series(df.columns).str.startswith('History')]].sum(axes=1)]

第二种方法:

df = pd.read_csv(file_path, sep='\t')
filter_col = [col for col in list(df) if col.startswith('History')]
df['History'] = 0 # initialize value, otherwise throws KeyError
for c in df[filter_col]:
    df['History'] = df[filter_col].sum(axes=1)
    print df['History', df[filter_col]]

但是，两者都给出错误:

However, both gives the error:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

我的问题是:我该如何调试此错误，或者有其他解决方案来解决我的问题.请注意，我有一个相当大的数据框，其中包含约100列和400000行，因此我正在寻找一种优化的解决方案，例如在熊猫中使用loc.

My question is either: how can I debug this error or is there another solution for my problem. Notice that I have a rather large dataframe with about 100 columns and 400000 rows, so I'm looking for an optimized solution, like using loc in pandas.

推荐答案

我建议您做一些不同的事情，即执行转置，将行的前缀(原始列)，总和和转置分组.再次.

I'd suggest that you do something different, which is to perform a transpose, groupby the prefix of the rows (your original columns), sum, and transpose again.

请考虑以下内容:

df = pd.DataFrame({
        'a_a': [1, 2, 3, 4],
        'a_b': [2, 3, 4, 5],
        'b_a': [1, 2, 3, 4],
        'b_b': [2, 3, 4, 5],
    })

现在

[s.split('_')[0] for s in df.T.index.values]

是列的前缀.所以

>>> df.T.groupby([s.split('_')[0] for s in df.T.index.values]).sum().T
    a   b
0   3   3
1   5   5
2   7   7
3   9   9

做您想要的事.

根据您的情况，请确保使用'-'字符进行分割.

In your case, make sure to split using the '-' character.

这篇关于以pandas数据框中的相同字符串开头的列的总和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

以pandas数据框中的相同字符串开头的列的总和 [英] sum values of columns starting with the same string in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

以pandas数据框中的相同字符串开头的列的总和 [英] sum values of columns starting with the same string in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭