计算 pandas 数据框中每一行的百分比 [英] Compute percentage for each row in pandas dataframe

查看:115
本文介绍了计算 pandas 数据框中每一行的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

                  country_name  country_code  val_code  \
   United States of America           231                     1   
   United States of America           231                     2   
   United States of America           231                     3   
   United States of America           231                     4   
   United States of America           231                     5   

      y191      y192      y193      y194      y195  \
   47052179  43361966  42736682  43196916  41751928   
   1187385   1201557   1172941   1176366   1192173   
   28211467  27668273  29742374  27543836  28104317   
   179000    193000    233338    276639    249688   
   12613922  12864425  13240395  14106139  15642337 

在上面的数据框中,我想为每一行计算该val_code所占总数的百分比,结果为foll.数据框.

In the data frame above, I would like to compute for each row, the percentage of the total occupied by that val_code, resulting in foll. data frame.

即总结每一行,然后除以所有行的总数

I.e. Sum up each row and divide by total of all rows

                  country_name  country_code  val_code  \
   United States of America           231                     1   
   United States of America           231                     2   
   United States of America           231                     3   
   United States of America           231                     4   
   United States of America           231                     5  

      perc   
  50.14947129
  1.363631254
  32.48344744
  0.260213146
  15.74323688

现在,我正在执行此操作,但是它不起作用

Right now, I am doing this, but it is not working

grp_df = df.groupby(['country_name', 'val_code']).agg()

pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))

推荐答案

为所有感兴趣的列求和,然后添加百分比列:

Ge the total for all the columns of interest and then add the percentage column:

In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df

Out[35]:
               country_name  country_code  val_code      y191      y192  \
0  United States of America           231         1  47052179  43361966   
1  United States of America           231         1   1187385   1201557   
2  United States of America           231         1  28211467  27668273   
3  United States of America           231         1    179000    193000   
4  United States of America           231         1  12613922  12864425   

       y193      y194      y195    percent  
0  42736682  43196916  41751928  50.149471  
1   1172941   1176366   1192173   1.363631  
2  29742374  27543836  28104317  32.483447  
3    233338    276639    249688   0.260213  
4  13240395  14106139  15642337  15.743237  

所以np.sum将所有值求和:

In [32]:
total = np.sum(df.ix[:,'y191':].values)
total

Out[32]:
434899243

然后我们在感兴趣的列上调用.sum(axis=1)/total * 100进行逐行求和,除以总数并乘以100得到一个百分比.

We then call .sum(axis=1)/total * 100 on the cols of interest to sum row-wise, divide by the total and multiply by 100 to get a percentage.

这篇关于计算 pandas 数据框中每一行的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆