提高 Pandas DataFrames 的行追加性能 [英] Improve Row Append Performance On Pandas DataFrames

查看:26
本文介绍了提高 Pandas DataFrames 的行追加性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个基本脚本,该脚本循环遍历嵌套字典,从每条记录中获取数据,并将其附加到 Pandas DataFrame.数据看起来像这样:

I am running a basic script that loops over a nested dictionary, grabs data from each record, and appends it to a Pandas DataFrame. The data looks something like this:

data = {"SomeCity": {"Date1": {record1, record2, record3, ...}, "Date2": {}, ...}, ...}

它总共有几百万条记录.脚本本身看起来像这样:

In total it has a few million records. The script itself looks like this:

city = ["SomeCity"]
df = DataFrame({}, columns=['Date', 'HouseID', 'Price'])
for city in cities:
    for dateRun in data[city]:
        for record in data[city][dateRun]:
            recSeries = Series([record['Timestamp'], 
                                record['Id'], 
                                record['Price']],
                                index = ['Date', 'HouseID', 'Price'])
            FredDF = FredDF.append(recSeries, ignore_index=True)

然而,这运行得非常缓慢.在我寻找并行化它的方法之前,我只想确保我没有遗漏一些明显的东西,因为我对 Pandas 还是很陌生.

This runs painfully slow, however. Before I look for a way to parallelize it, I just want to make sure I'm not missing something obvious that would make this perform faster as it is, as I'm still quite new to Pandas.

推荐答案

我还在循环中使用了数据帧的 append 函数,但我很困惑它的运行速度有多慢.

I also used the dataframe's append function inside a loop and I was perplexed how slow it ran.

基于本页正确答案的对受苦者的有用示例.

A useful example for those who are suffering, based on the correct answer on this page.

Python 版本:3

Python version: 3

熊猫版本:0.20.3

Pandas version: 0.20.3

# the dictionary to pass to pandas dataframe
d = {}

# a counter to use to add entries to "dict"
i = 0 

# Example data to loop and append to a dataframe
data = [{"foo": "foo_val_1", "bar": "bar_val_1"}, 
       {"foo": "foo_val_2", "bar": "bar_val_2"}]

# the loop
for entry in data:

    # add a dictionary entry to the final dictionary
    d[i] = {"col_1_title": entry['foo'], "col_2_title": entry['bar']}
    
    # increment the counter
    i = i + 1

# create the dataframe using 'from_dict'
# important to set the 'orient' parameter to "index" to make the keys as rows
df = DataFrame.from_dict(d, "index")

from_dict"功能:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html

The "from_dict" function: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html

这篇关于提高 Pandas DataFrames 的行追加性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆