什么是通过 pandas 循环数据框的最有效的方法? [英] What is the most efficient way to loop through dataframes with pandas?

查看:131
本文介绍了什么是通过 pandas 循环数据框的最有效的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



例如,我使用以下从雅虎财经:

 日期,开盘价,最高价,最低价,收盘价,调整关闭
2011-10-19,27.37,27.47,27.01,27.13,42880000,27.13
2011-10-18,26.94, 27.40,26.80,27.31,52487900,27.31
2011-10-17,27.11,27.42,26.85,26.98,39433400,26.98
2011-10-14,27.31,27.50,27.02,27.27,50947700, 27.27

....

然后我做以下操作: p>

 #!/ usr / bin / env python $ b $ from pandas import * 

df = read_csv ('table.csv')

为列,枚举(df.values):
日期= df.index [i]
开放,高,低,关闭,b


$ b











$ b <这是最有效的方法吗?考虑到熊猫的速度的重点,我会假设必须有一个特殊的函数遍历值的方式,也可以检索索引(可能通过一个生成器是内存效率)? df.iteritems 不幸的是只能按列迭代。

解决方案

版本的熊猫现在包括一个内置的函数来遍历行。
$ b $ pre $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $或者,如果你想更快地使用 itertuples()

但是,unutbu建议使用numpy函数来避免遍历行将产生最快的代码。

I want to perform my own complex operations on financial data in dataframes in a sequential manner.

For example I am using the following MSFT CSV file taken from Yahoo Finance:

Date,Open,High,Low,Close,Volume,Adj Close
2011-10-19,27.37,27.47,27.01,27.13,42880000,27.13
2011-10-18,26.94,27.40,26.80,27.31,52487900,27.31
2011-10-17,27.11,27.42,26.85,26.98,39433400,26.98
2011-10-14,27.31,27.50,27.02,27.27,50947700,27.27

....

I then do the following:

#!/usr/bin/env python
from pandas import *

df = read_csv('table.csv')

for i, row in enumerate(df.values):
    date = df.index[i]
    open, high, low, close, adjclose = row
    #now perform analysis on open/close based on date, etc..

Is that the most efficient way? Given the focus on speed in pandas, I would assume there must be some special function to iterate through the values in a manner that one also retrieves the index (possibly through a generator to be memory efficient)? df.iteritems unfortunately only iterates column by column.

解决方案

The newest versions of pandas now include a built-in function for iterating over rows.

for index, row in df.iterrows():

    # do some logic here

Or, if you want it faster use itertuples()

But, unutbu's suggestion to use numpy functions to avoid iterating over rows will produce the fastest code.

这篇关于什么是通过 pandas 循环数据框的最有效的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆