改善Pandas DataFrame上的行追加性能 [英] Improve Row Append Performance On Pandas DataFrames
问题描述
我正在运行一个基本脚本,该脚本遍历嵌套字典,从每个记录中获取数据,并将其附加到Pandas DataFrame.数据看起来像这样:
I am running a basic script that loops over a nested dictionary, grabs data from each record, and appends it to a Pandas DataFrame. The data looks something like this:
data = {"SomeCity": {"Date1": {record1, record2, record3, ...}, "Date2": {}, ...}, ...}
总共有几百万条记录.脚本本身看起来像这样:
In total it has a few million records. The script itself looks like this:
city = ["SomeCity"]
df = DataFrame({}, columns=['Date', 'HouseID', 'Price'])
for city in cities:
for dateRun in data[city]:
for record in data[city][dateRun]:
recSeries = Series([record['Timestamp'],
record['Id'],
record['Price']],
index = ['Date', 'HouseID', 'Price'])
FredDF = FredDF.append(recSeries, ignore_index=True)
但是,运行速度很慢.在寻找并行化方法之前,我只想确保我没有错过任何明显的东西,因为它对Pandas来说还很陌生,因此可以使它按原样更快地运行.
This runs painfully slow, however. Before I look for a way to parallelize it, I just want to make sure I'm not missing something obvious that would make this perform faster as it is, as I'm still quite new to Pandas.
推荐答案
我还在循环内使用了数据框的 append 函数,我感到困惑的是它的运行速度如何.
I also used the dataframe's append function inside a loop and I was perplexed how slow it ran.
根据此页面上的正确答案,为遭受苦难的人提供有用的示例.
A useful example for those who are suffering, based on the correct answer on this page.
Python版本:3
Python version: 3
熊猫版本:0.20.3
Pandas version: 0.20.3
# the dictionary to pass to panda's dataframe
dict = {}
# a counter to use to add entries to "dict"
i = 0
# Example data to loop and append to a dataframe
data = [{"foo": "foo_val_1", "bar": "bar_val_1"},
{"foo": "foo_val_2", "bar": "bar_val_2"}]
# the loop
for entry in data:
# add a dictionary entry to the final dictionary
dict[i] = {"col_1_title": entry['foo'], "col_2_title": entry['bar']}
# increment the counter
i = i + 1
# create the dataframe using 'from_dict'
# important to set the 'orient' parameter to "index" to make the keys as rows
df = DataFrame.from_dict(dict, "index")
"from_dict"函数: https://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.from_dict.html
The "from_dict" function: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html
这篇关于改善Pandas DataFrame上的行追加性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!