Python - 高效的方式向数据框添加行 [英] Python - Efficient way to add rows to dataframe
问题描述
从此问题和其他似乎不建议使用 concat
或 append
构建一个熊猫数据框,因为它正在重新整理数据帧每一次。
From this question and others it seems that it is not recommended to use concat
or append
to build a pandas dataframe because it is recopying the whole dataframe each time.
我的项目涉及每30秒检索少量数据。这可能会运行3天的周末,所以有人可以很容易地期望一次创建一个行超过8000行。在这个数据框中添加行最有效的方式是什么?
My project involves retrieving a small amount of data every 30 seconds. This might run for a 3 day weekend, so someone could easily expect over 8000 rows to be created one row at a time. What would be the most efficient way to add rows to this dataframe?
推荐答案
您可以在现场使用DataFrame行添加行 loc
在不存在的索引上。从熊猫文档:
You can add rows to a DataFrame in-place using loc
on a non-existent index. From the Pandas documentation:
In [119]: dfi
Out[119]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
In [120]: dfi.loc[3] = 5
In [121]: dfi
Out[121]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
3 5 5 5
如预期的那样,使用 loc
比 append
(约14x):
As expected, using loc
is considerably faster than append
(about 14x):
import pandas as pd
df = pd.DataFrame({"A": [1,2,3], "B": [1,2,3], "C": [1,2,3]})
%%timeit
df2 = pd.DataFrame({"A": [4], "B": [4], "C": [4]})
df.append(df2)
# 1000 loops, best of 3: 1.61 ms per loop
%%timeit
df.loc[3] = 4
# 10000 loops, best of 3: 113 µs per loop
这篇关于Python - 高效的方式向数据框添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!