在Python Pandas中拆分和连接数据框以使用rpy2进行绘图 [英] splitting and concatenating dataframes in Python pandas for plotting with rpy2

查看:189
本文介绍了在Python Pandas中拆分和连接数据框以使用rpy2进行绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Python中的pandas数据框有一个疑问:我有一个大数据框df,我将其分为两个子集df1df2. df1df2共同不构成df的全部,它们只是它的两个互斥子集.我想用rpy2在ggplot中绘制此图,并根据变量来自df1还是df2在图中显示变量. ggplot2需要一个融化的数据框,因此我必须创建一个新的数据框,该列具有一列,该列说明每个条目是来自df1还是df2,以便可以将该列传递给ggplot.我尝试这样做:

I have a question about pandas dataframes in Python: I have a large dataframe df that I split into two subsets, df1 and df2. df1 and df2 together do not make up all of df, they are just two mutually exclusive subsets of it. I want to plot this in ggplot with rpy2 and display the variables in the plot based on whether they come from df1 or df2. ggplot2 requires a melted dataframe so I have to create a new dataframe that has a column saying whether each entry was from df1 or df2, so that this column can be passed to ggplot. I tried doing it like this:

# add labels to df1, df2
df1["label"] = len(df1.index) * ["df1"]
df2["label"] = len(df2.index) * ["df2"]
# combine the dfs together
melted_df = pandas.concat([df1, df2])

现在可以将其绘制为:

# plot parameters from melted_df and colour them by df1 or df2
ggplot2.ggplot(melted_df) + ggplot2.ggplot(aes_string(..., colour="label"))

我的问题是,是否有更简便,快捷的方法来做到这一点. ggplot需要恒定的融化/未融化df,总是手动将融化的形式添加到df的不同子集中似乎很麻烦.谢谢.

My question is whether there's an easier, short hand way of doing this. ggplot requires constant melting/unmelting dfs and it seems cumbersome to always manually add the melted form to distinct subsets of df. Thanks.

推荐答案

当然,您可以使用以下方法来简化操作:

Certainly you can simplify by using:

df1['label'] = 'df1'

(而不是df1["label"] = len(df1.index) * ["df1"].)

如果您发现自己经常这样做,为什么不创建自己的函数呢? (类似这样):

If you find yourself doing this a lot, why not create your own function? (something like this):

plot_dfs(dfs):
    for i, df in enumerate(dfs):
        df['label'] =  'df%s' % i+1 # note: this *changes* df
    melted_df = pd.concat(dfs)

    # plot parameters from melted_df and colour them by df1 or df2
    ggplot2.ggplot(melted_df) + ggplot2.ggplot(aes_string(..., colour="label"))

    return # the melted_df or ggplot ?

这篇关于在Python Pandas中拆分和连接数据框以使用rpy2进行绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆