在python pandas中添加2个数据框 [英] Adding 2 data frame in python pandas
问题描述
我想在Python Pandas中合并以下形状的2个单独的数据框:
I want to combine 2 seperate data frame of the following shape in Python Pandas:
Df1=
A B
1 1 2
2 3 4
3 5 6
Df2 =
C D
1 a b
2 c d
3 e f
我希望具有以下内容:
df =
A B C D
1 1 2 a b
2 3 4 c d
3 5 6 e f
我正在使用以下代码:
dat = df1.join(df2)
但是问题是,在我的实际数据帧中,有超过200万行,并且这花费了很长时间并且消耗了大量内存.
But problem is that, In my actual data frame there are more than 2 Million rows and for that it takes too long time and consumes huge memory.
有什么方法可以更快,更高效地存储内存吗?
Is there any way to do it faster and memory efficient?
预先感谢您的帮助.
推荐答案
如果我正确阅读了您的问题,则索引会完全对齐,您只需要将列合并到单个DataFrame中即可.如果这是正确的,那么事实证明,将列从一个DataFrame复制到另一个是最快的方法([92]
和[93]
).在下面的示例中,f
是我的DataFrame:
If I've read your question correctly, your indexes align exactly and you just need to combine columns into a single DataFrame. If that's right then it turns out that copying over a column from one DataFrame to another is the fastest way to go ([92]
and [93]
). f
is my DataFrame in the example below:
In [85]: len(f)
Out[86]: 343720
In [87]: a = f.loc[:, ['date_val', 'price']]
In [88]: b = f.loc[:, ['red_date', 'credit_spread']]
In [89]: %timeit c = pd.concat([a, b], axis=1)
100 loops, best of 3: 7.11 ms per loop
In [90]: %timeit c = pd.concat([a, b], axis=1, ignore_index=True)
100 loops, best of 3: 10.8 ms per loop
In [91]: %timeit c = a.join(b)
100 loops, best of 3: 6.47 ms per loop
In [92]: %timeit a['red_date'] = b['red_date']
1000 loops, best of 3: 1.17 ms per loop
In [93]: %timeit a['credit_spread'] = b['credit_spread']
1000 loops, best of 3: 1.16 ms per loop
我还尝试一次复制两个列,但是由于某些奇怪的原因,它比单独复制每个列要慢两倍以上.
I also tried to copy both columns at once but for some strange reason it was more than two times slower than copying each column individually.
In [94]: %timeit a[['red_date', 'credit_spread']] = b[['red_date', 'credit_spread']]
100 loops, best of 3: 5.09 ms per loop
这篇关于在python pandas中添加2个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!