划分未堆叠数据框的两列 [英] Dividing two columns of an unstacked dataframe
问题描述
我在熊猫数据框中有两列.
I have two columns in a pandas dataframe.
第1列是ed,其中包含字符串(例如'a','a','b,'c','c','a')
Column 1 is ed and contains strings (e.g. 'a','a','b,'c','c','a')
ed column = ['a','a','b','c','c','a']
第2列是工作,还包含字符串(例如'aa','bb','aa','aa','bb','cc')
Column 2 is job and also contains strings (e.g. 'aa','bb','aa','aa','bb','cc')
job column = ['aa','bb','aa','aa','bb','cc'] #these are example values from column 2 of my pandas data frame
然后我生成一个两列频率表,如下所示:
I then generate a two column frequency table like this:
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)
现在如何将频率表中一列中的频率除以另一列中的频率?我想采用该比率并将其用于argsort()
,以便我可以按计算出的比率进行排序,但是我不知道如何引用结果表的每一列.
Now how do I then divide the frequencies in one column by the frequencies in another column of that frequency table? I want to take that ratio and use it to argsort()
so that I can sort by the calculated ratio but I don't know how to reference each column of the resulting table.
推荐答案
我将数据初始化如下:
ed_col = ['a','a','b','c','c','a']
job_col = ['aa','bb','aa','aa','bb','cc']
pdata = pd.DataFrame({'ed':ed_col, 'job':job_col})
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)
现在my_counts看起来像这样:
Now my_counts looks like this:
job aa bb cc
ed
a 1 1 1
b 1 0 0
c 1 1 0
要访问列,可以使用my_counts.aa
或my_counts['aa']
.
要访问一行,可以使用my_counts.loc['a']
.
To access a column, you could use my_counts.aa
or my_counts['aa']
.
To access a row, you could use my_counts.loc['a']
.
所以aa的频率除以bb是my_counts['aa'] / my_counts['bb']
So the frequencies of aa divided by bb are my_counts['aa'] / my_counts['bb']
现在,如果要对其进行排序,可以执行以下操作:
and now, if you want to get it sorted, you can do:
my_counts.iloc[(my_counts['aa'] / my_counts['bb']).argsort()]
这篇关于划分未堆叠数据框的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!