为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn't this function "take" after I iterrows over a pandas DataFrame?

查看:79
本文介绍了为什么此功能不“采用"?在遍历pandas DataFrame之后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有时间戳的温度和风速值的DataFrame,以及一个将其转换为风寒"的函数.我正在使用iterrows在每一行上运行该函数,并希望通过一个漂亮的"Wind Chill"列来获取DataFrame.

I have a DataFrame with timestamped temperature and wind speed values, and a function to convert those into a "wind chill." I'm using iterrows to run the function on each row, and hoping to get a DataFrame out with a nifty "Wind Chill" column.

但是,尽管它似乎可以正常运行,并且实际上至少已经工作"了一次,但我似乎无法始终如一地复制它.总的来说,我感觉这是我缺少的有关DataFrames结构的东西,但我希望有人能提供帮助.

However, while it seems to work as it's going through, and has actually "worked" at least once, I can't seem to replicate it consistently. I feel like it's something I'm missing about the structure of DataFrames, in general, but I'm hoping someone can help.

In [28]: bigdf.head()
Out[28]: 


                           Day  Temperature  Wind Speed  Year
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003

因此,我将列添加到bigdf,并预先填充NaN.

So I add a 'Wind Chill' column to bigdf and prepopulate with NaN.

In [29]: bigdf['Wind Chill'] = NaN

然后,我尝试遍历各行,以添加实际的Wind Chills.

Then I try to iterate over the rows, to add the actual Wind Chills.

In [30]: for row_index, row in bigdf[:5].iterrows():
    ...:     row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed'])
    ...:     print row['Wind Chill']
    ...:
24.7945889994
25.1365267133
25.934114012
28.2194307516
29.5051046953

您可以说,出现了新值 ,该值将应用于"Wind Chill"列.这是windchill函数,以防万一:

As you can say, the new values appear to be applied to the 'Wind Chill' column. Here's the windchill function, just in case that helps:

def windchill(temp, wind):
    if temp>50 or wind<=3:
        return temp
    else:
        return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16

但是,当我再次查看DataFrame时,NaN仍然存在:

But, when I look at the DataFrame again, the NaN's are still there:

In [31]: bigdf.head()
Out[31]: 

                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003  NaN
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003  NaN
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003  NaN
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003  NaN
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003  NaN

甚至更奇怪的是,它已经工作了一次或两次,但我不能说我做了什么不同的事情.

What's even weirder is that it has worked once or twice, and I can't tell what I did differently.

我必须承认我对熊猫的内部运作不是特别熟悉,并且对索引等感到困惑,所以我觉得我可能在这里缺少了一些非常基本的东西(或者很难做到).

I must admit I'm not especially familiar with the inner workings of pandas, and get confused with indexing, etc., so I feel like I'm probably missing something very basic here (or doing this the hard way).

谢谢!

推荐答案

您可以使用

You can use apply to do this:

In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                 axis=1)
Out[11]:
2003-03-01 06:00:00-05:00    24.794589
2003-03-01 07:00:00-05:00    25.136527
2003-03-01 08:00:00-05:00    25.934114
2003-03-01 09:00:00-05:00    28.219431
2003-03-01 10:00:00-05:00    29.505105

In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                                    axis=1)

In [13]: df
Out[13]:
                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00    1        30.27        5.27  2003   24.794589
2003-03-01 07:00:00-05:00    1        30.21        4.83  2003   25.136527
2003-03-01 08:00:00-05:00    1        31.81        6.09  2003   25.934114
2003-03-01 09:00:00-05:00    1        34.04        6.61  2003   28.219431
2003-03-01 10:00:00-05:00    1        35.31        6.97  2003   29.505105

.

为了进一步说明混淆的原因,我认为这是因为行变量是

To expand on the reason for your confusion, I think it stems from the fact that the row variables are copies rather than views of the df, so changes don't propagate:

In [21]: for _, row in df.iterrows(): row['Day'] = 2

我们看到它正在成功更改副本(row变量)

We see that it is making the change successfully to the copy, the row variable(s):

In [22]: row
Out[22]:
Day               2.00
Temperature      35.31
Wind Speed        6.97
Year           2003.00
Name: 2003-03-01 10:00:00-05:00

Bu他们不会更新到DataFrame:

Bu they don't update to the DataFrame:

In [23]: df
Out[23]:
                           Day  Temperature  Wind Speed  Year
2003-03-01 06:00:00-05:00    1        30.27        5.27  2003
2003-03-01 07:00:00-05:00    1        30.21        4.83  2003
2003-03-01 08:00:00-05:00    1        31.81        6.09  2003
2003-03-01 09:00:00-05:00    1        34.04        6.61  2003
2003-03-01 10:00:00-05:00    1        35.31        6.97  2003

以下内容也保持df不变:

In [24]: row = df.ix[0]  # also a copy

In [25]: row['Day'] = 2

如果我们确实采取了视图 :(我们会看到更改 df.)

Whereas if we do take a view: (we'll see a change df.)

In [26]: row = df.ix[2:3]  # this one's a view

In [27]: row['Day'] = 3

请参见返回视图与副本(在文档).

这篇关于为什么此功能不“采用"?在遍历pandas DataFrame之后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆