python pandas中的R dcast等效项 [英] R dcast equivalent in python pandas
问题描述
我正在尝试在python中执行以下命令的等效操作:
I am trying to do the equivalent of the below commands in python:
test <- data.frame(convert_me=c('Convert1','Convert2','Convert3'),
values=rnorm(3,45, 12), age_col=c('23','33','44'))
test
library(reshape2)
t <- dcast(test, values ~ convert_me+age_col, length )
t
也就是说,这个:
convert_me values age_col
Convert1 21.71502 23
Convert2 58.35506 33
Convert3 60.41639 44
成为这个:
values Convert2_33 Convert1_23 Convert3_44
21.71502 0 1 0
58.35506 1 0 0
60.41639 0 0 1
我知道使用伪变量可以获取列的值并将其转换为列的名称,但是像R一样,有没有一种方法可以轻松地合并它们(组合)?
I know that with dummy variables I can get the value of the columns and transform as the name of the column, but is there a way to merge them(combination) easily, as R does?
推荐答案
您可以使用 crosstab
函数:
You can use the crosstab
function for this:
In [14]: pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
Out[14]:
convert_me Convert1 Convert2 Convert3
age_col 23 33 44
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
或 pivot_table
(使用len
作为聚合函数,但是在这里您必须手动fillna
带有零的NaN):
or the pivot_table
(with len
as the aggregating function, but here you have to fillna
the NaNs with zeros manually):
In [18]: df.pivot_table(index=['values'], columns=['age_col', 'convert_me'], aggfunc=len).fillna(0)
Out[18]:
age_col 23 33 44
convert_me Convert1 Convert2 Convert3
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
See here for the docs on this: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations
pandas中的大多数函数将返回一个多级(分层)索引,在本例中为列.如果您想像在R中那样将其融合"到一个级别中,则可以执行以下操作:
Most functions in pandas will return a multi-level (hierarchical) index, in this case for the columns. If you want to 'melt' this into one level like in R you can do:
In [15]: df_cross = pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
In [16]: df_cross.columns = ["{0}_{1}".format(l1, l2) for l1, l2 in df_cross.columns]
In [17]: df_cross
Out[17]:
Convert1_23 Convert2_33 Convert3_44
values
21.71502 1 0 0
58.35506 0 1 0
60.41639 0 0 1
这篇关于python pandas中的R dcast等效项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!