使用 Pandas 遍历数据帧的最有效方法是什么? [英] What is the most efficient way to loop through dataframes with pandas?

查看:28
本文介绍了使用 Pandas 遍历数据帧的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以顺序方式对数据帧中的财务数据执行我自己的复杂操作.

I want to perform my own complex operations on financial data in dataframes in a sequential manner.

例如,我使用以下来自 雅虎财经的 MSFT CSV 文件:

For example I am using the following MSFT CSV file taken from Yahoo Finance:

Date,Open,High,Low,Close,Volume,Adj Close
2011-10-19,27.37,27.47,27.01,27.13,42880000,27.13
2011-10-18,26.94,27.40,26.80,27.31,52487900,27.31
2011-10-17,27.11,27.42,26.85,26.98,39433400,26.98
2011-10-14,27.31,27.50,27.02,27.27,50947700,27.27

....

然后我执行以下操作:

#!/usr/bin/env python
from pandas import *

df = read_csv('table.csv')

for i, row in enumerate(df.values):
    date = df.index[i]
    open, high, low, close, adjclose = row
    #now perform analysis on open/close based on date, etc..

这是最有效的方式吗?鉴于 Pandas 对速度的关注,我认为必须有一些特殊的函数来迭代这些值,同时还可以检索索引(可能通过生成器来提高内存效率)?df.iteritems 不幸的是只能逐列迭代.

Is that the most efficient way? Given the focus on speed in pandas, I would assume there must be some special function to iterate through the values in a manner that one also retrieves the index (possibly through a generator to be memory efficient)? df.iteritems unfortunately only iterates column by column.

推荐答案

最新版本的 Pandas 现在包含一个用于遍历行的内置函数.

The newest versions of pandas now include a built-in function for iterating over rows.

for index, row in df.iterrows():

    # do some logic here

或者,如果你想要更快,请使用 itertuples()

Or, if you want it faster use itertuples()

但是,unutbu 建议使用 numpy 函数来避免迭代行将产生最快的代码.

But, unutbu's suggestion to use numpy functions to avoid iterating over rows will produce the fastest code.

这篇关于使用 Pandas 遍历数据帧的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆