划分未堆叠数据框的两列 [英] Dividing two columns of an unstacked dataframe

查看:86
本文介绍了划分未堆叠数据框的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫数据框中有两列.

I have two columns in a pandas dataframe.

第1列是ed,其中包含字符串(例如'a','a','b,'c','c','a')

Column 1 is ed and contains strings (e.g. 'a','a','b,'c','c','a')

ed column = ['a','a','b','c','c','a'] 

第2列是工作,还包含字符串(例如'aa','bb','aa','aa','bb','cc')

Column 2 is job and also contains strings (e.g. 'aa','bb','aa','aa','bb','cc')

job column = ['aa','bb','aa','aa','bb','cc'] #these are example values from column 2 of my pandas data frame

然后我生成一个两列频率表,如下所示:

I then generate a two column frequency table like this:

my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)

现在如何将频率表中一列中的频率除以另一列中的频率?我想采用该比率并将其用于argsort(),以便我可以按计算出的比率进行排序,但是我不知道如何引用结果表的每一列.

Now how do I then divide the frequencies in one column by the frequencies in another column of that frequency table? I want to take that ratio and use it to argsort() so that I can sort by the calculated ratio but I don't know how to reference each column of the resulting table.

推荐答案

我将数据初始化如下:

ed_col = ['a','a','b','c','c','a']
job_col = ['aa','bb','aa','aa','bb','cc']
pdata = pd.DataFrame({'ed':ed_col, 'job':job_col})
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)

现在my_counts看起来像这样:

Now my_counts looks like this:

job  aa  bb  cc
ed             
a     1   1   1
b     1   0   0
c     1   1   0

要访问列,可以使用my_counts.aamy_counts['aa']. 要访问一行,可以使用my_counts.loc['a'].

To access a column, you could use my_counts.aa or my_counts['aa']. To access a row, you could use my_counts.loc['a'].

所以aa的频率除以bb是my_counts['aa'] / my_counts['bb']

So the frequencies of aa divided by bb are my_counts['aa'] / my_counts['bb']

现在,如果要对其进行排序,可以执行以下操作:

and now, if you want to get it sorted, you can do:

my_counts.iloc[(my_counts['aa'] / my_counts['bb']).argsort()]

这篇关于划分未堆叠数据框的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆