Python Pandas操作数据框 [英] Python Pandas manipulating dataframe

查看:137
本文介绍了Python Pandas操作数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的df:

I have a df that looks like this:

names    col1   col2   col3   total     total_col1      total_col2
 bbb      1      1      0      2         DF1, DF2           DF1           
 ccc      1      0      0      1         DF1                        
 zzz      0      1      1      2                            DF2     
 qqq      0      1      0      1                           DF1, Df2
 rrr      0      0      1      1

,我想计算每个total_col#中的数字并添加wnother full total col,因此输出为:

and I want to count the number in each total_col# and add wnother full total col so the output would be:

names    col1   col2   col3   total  total_full     total_col1      total_col2
 bbb      1      1      0      2          5              2             1   
 ccc      1      0      0      1          2              1                      
 zzz      0      1      1      2          3              1    
 qqq      0      1      0      1          3              2
 rrr      0      0      1      1

因此,每个total col都将其中的DF数量相加,而total full将这些列与total列相加.

so each total col sums the number of DFs in it, and total full sums those cols with the total col.

大熊猫有可能吗?

推荐答案

您可以使用

totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
#    total_col1  total_col2
# 0         2.0         1.0
# 1         1.0         NaN
# 2         NaN         1.0
# 3         NaN         2.0

计算总数列中的字符串数.

to count the number of strings in the totals columns.

要将非NaN值排序到每一行的末尾,可以使用

To sort the non-NaN values to the end of each row, you could use

counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)


import numpy as np
import pandas as pd
nan = np.nan

df = pd.DataFrame({'col1': [1, 1, 0, 0, 0],
 'col2': [1, 0, 1, 1, 0],
 'col3': [0, 0, 1, 0, 1],
 'names': ['bbb', 'ccc', 'zzz', 'qqq', 'rrr'],
 'total': [2, 1, 2, 1, 1],
 'total_col1': ['DF1, DF2', 'DF1', nan, nan, nan],
 'total_col2': ['DF1', nan, 'DF2', 'DF1, Df2', nan]})

totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)
df[totals.columns] = counts
df['total_full'] = df.filter(regex=r'^total').sum(axis=1)
print(df)

收益

   col1  col2  col3 names  total  total_col1  total_col2  total_full
0     1     1     0   bbb      2         1.0         2.0         5.0
1     1     0     0   ccc      1         1.0         NaN         2.0
2     0     1     1   zzz      2         1.0         NaN         3.0
3     0     1     0   qqq      1         2.0         NaN         3.0
4     0     0     1   rrr      1         NaN         NaN         1.0

这篇关于Python Pandas操作数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆