按预定义列串联/合并数据框列表 [英] Concatenating/Merging List of Dataframes by Predefined columns

查看:90
本文介绍了按预定义列串联/合并数据框列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据帧列表:

import pandas as pd
rep1 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP1',[1.00,23.22,11.12])], orient='columns')
rep2 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP2',[11.33,31.25,22.12])], orient='columns')
rep3 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux'])], orient='columns')
tmp = []
tmp.append(rep1)
tmp.append(rep2)
tmp.append(rep3)

# In actuality the DF could be more than 3.

哪个会产生:

In [53]: tmp
Out[53]:
[  Probe Gene    RP1
 0     x  foo   1.00
 1     y  bar  23.22
 2     z  qux  11.12,   Probe Gene    RP2
 0     x  foo  11.33
 1     y  bar  31.25
 2     z  qux  22.12,   Probe Gene
 0     x  foo
 1     y  bar
 2     z  qux]

我想要做的是连接该数据帧列表,以便得到以下结果:

What I want to do is to concatenate that list of dataframes so that it results in this:

  Probe Gene      RP1        RP2
0     x  foo     1.00      11.33
1     y  bar    23.22      31.25
2     z  qux    11.12      22.12

请注意,rep3仅包含两列.在连接的过程中,我们希望将其自动丢弃.

Note that rep3 only contain two columns. In the process of concatenating, we hope to automatically discard it.

我尝试使用此代码,但无济于事.正确的方法是什么?

I tried with this code but no avail. What's the right way to do it?

In [57]: full_df = pd.concat(tmp,axis=1).fillna(0)

In [58]: full_df
Out[58]:
  Probe Gene    RP1 Probe Gene    RP2 Probe Gene
0     x  foo   1.00     x  foo  11.33     x  foo
1     y  bar  23.22     y  bar  31.25     y  bar
2     z  qux  11.12     z  qux  22.12     z  qux

推荐答案

我不确定这是否是正确的方法,但是一种整洁的方法是使用减少:

I'm not sure this is the right way to do this, but a kind-of neat way is to use reduce:

In [11]: reduce(pd.merge, tmp)
Out[11]:
  Probe Gene    RP1    RP2
0     x  foo   1.00  11.33
1     y  bar  23.22  31.25
2     z  qux  11.12  22.12


这基本上等同于:


This is basically equivalent to:

tmp[0].merge(tmp[1]).merge(tmp[2])...

注意:这意味着如果tmp中有很多DataFrame,它的效率可能不如concat.

这篇关于按预定义列串联/合并数据框列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆