在python pandas中添加2个数据框 [英] Adding 2 data frame in python pandas

查看:173
本文介绍了在python pandas中添加2个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Python Pandas中合并以下形状的2个单独的数据框:

I want to combine 2 seperate data frame of the following shape in Python Pandas:

Df1=
       A    B
    1  1    2
    2  3    4
    3  5    6

Df2 = 
       C    D
    1  a    b
    2  c    d
    3  e    f

我希望具有以下内容:

df = 
       A    B    C    D
   1   1    2    a    b
   2   3    4    c    d
   3   5    6    e    f

我正在使用以下代码:

dat = df1.join(df2)

但是问题是,在我的实际数据帧中,有超过200万行,并且这花费了很长时间并且消耗了大量内存.

But problem is that, In my actual data frame there are more than 2 Million rows and for that it takes too long time and consumes huge memory.

有什么方法可以更快,更高效地存储内存吗?

Is there any way to do it faster and memory efficient?

预先感谢您的帮助.

推荐答案

如果我正确阅读了您的问题,则索引会完全对齐,您只需要将列合并到单个DataFrame中即可.如果这是正确的,那么事实证明,将列从一个DataFrame复制到另一个是最快的方法([92][93]).在下面的示例中,f是我的DataFrame:

If I've read your question correctly, your indexes align exactly and you just need to combine columns into a single DataFrame. If that's right then it turns out that copying over a column from one DataFrame to another is the fastest way to go ([92] and [93]). f is my DataFrame in the example below:

In [85]: len(f)
Out[86]: 343720

In [87]: a = f.loc[:, ['date_val', 'price']]
In [88]: b = f.loc[:, ['red_date', 'credit_spread']]

In [89]: %timeit c = pd.concat([a, b], axis=1)
100 loops, best of 3: 7.11 ms per loop

In [90]: %timeit c = pd.concat([a, b], axis=1, ignore_index=True)
100 loops, best of 3: 10.8 ms per loop

In [91]: %timeit c = a.join(b)
100 loops, best of 3: 6.47 ms per loop

In [92]: %timeit a['red_date'] = b['red_date']
1000 loops, best of 3: 1.17 ms per loop

In [93]: %timeit a['credit_spread'] = b['credit_spread']
1000 loops, best of 3: 1.16 ms per loop

我还尝试一次复制两个列,但是由于某些奇怪的原因,它比单独复制每个列要慢两倍以上.

I also tried to copy both columns at once but for some strange reason it was more than two times slower than copying each column individually.

In [94]: %timeit a[['red_date', 'credit_spread']] = b[['red_date', 'credit_spread']]
100 loops, best of 3: 5.09 ms per loop

这篇关于在python pandas中添加2个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆