为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn't this function "take" after I iterrows over a pandas DataFrame?

查看：79 发布时间：2020/5/24 0:15:54 python pandas

本文介绍了为什么此功能不“采用"?在遍历pandas DataFrame之后?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有时间戳的温度和风速值的DataFrame，以及一个将其转换为风寒"的函数.我正在使用iterrows在每一行上运行该函数，并希望通过一个漂亮的"Wind Chill"列来获取DataFrame.

I have a DataFrame with timestamped temperature and wind speed values, and a function to convert those into a "wind chill." I'm using iterrows to run the function on each row, and hoping to get a DataFrame out with a nifty "Wind Chill" column.

但是，尽管它似乎可以正常运行，并且实际上至少已经工作"了一次，但我似乎无法始终如一地复制它.总的来说，我感觉这是我缺少的有关DataFrames结构的东西，但我希望有人能提供帮助.

However, while it seems to work as it's going through, and has actually "worked" at least once, I can't seem to replicate it consistently. I feel like it's something I'm missing about the structure of DataFrames, in general, but I'm hoping someone can help.

In [28]: bigdf.head()
Out[28]: 


                           Day  Temperature  Wind Speed  Year
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003

因此，我将列添加到bigdf，并预先填充NaN.

So I add a 'Wind Chill' column to bigdf and prepopulate with NaN.

In [29]: bigdf['Wind Chill'] = NaN

然后，我尝试遍历各行，以添加实际的Wind Chills.

Then I try to iterate over the rows, to add the actual Wind Chills.

In [30]: for row_index, row in bigdf[:5].iterrows():
    ...:     row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed'])
    ...:     print row['Wind Chill']
    ...:
24.7945889994
25.1365267133
25.934114012
28.2194307516
29.5051046953

您可以说，出现了新值，该值将应用于"Wind Chill"列.这是windchill函数，以防万一:

As you can say, the new values appear to be applied to the 'Wind Chill' column. Here's the windchill function, just in case that helps:

def windchill(temp, wind):
    if temp>50 or wind<=3:
        return temp
    else:
        return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16

但是，当我再次查看DataFrame时，NaN仍然存在:

But, when I look at the DataFrame again, the NaN's are still there:

In [31]: bigdf.head()
Out[31]: 

                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003  NaN
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003  NaN
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003  NaN
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003  NaN
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003  NaN

甚至更奇怪的是，它已经工作了一次或两次，但我不能说我做了什么不同的事情.

What's even weirder is that it has worked once or twice, and I can't tell what I did differently.

我必须承认我对熊猫的内部运作不是特别熟悉，并且对索引等感到困惑，所以我觉得我可能在这里缺少了一些非常基本的东西(或者很难做到).

I must admit I'm not especially familiar with the inner workings of pandas, and get confused with indexing, etc., so I feel like I'm probably missing something very basic here (or doing this the hard way).

谢谢！

推荐答案

您可以使用

You can use apply to do this:

In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                 axis=1)
Out[11]:
2003-03-01 06:00:00-05:00    24.794589
2003-03-01 07:00:00-05:00    25.136527
2003-03-01 08:00:00-05:00    25.934114
2003-03-01 09:00:00-05:00    28.219431
2003-03-01 10:00:00-05:00    29.505105

In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                                    axis=1)

In [13]: df
Out[13]:
                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00    1        30.27        5.27  2003   24.794589
2003-03-01 07:00:00-05:00    1        30.21        4.83  2003   25.136527
2003-03-01 08:00:00-05:00    1        31.81        6.09  2003   25.934114
2003-03-01 09:00:00-05:00    1        34.04        6.61  2003   28.219431
2003-03-01 10:00:00-05:00    1        35.31        6.97  2003   29.505105

为了进一步说明混淆的原因，我认为这是因为行变量是

To expand on the reason for your confusion, I think it stems from the fact that the row variables are copies rather than views of the df, so changes don't propagate:

In [21]: for _, row in df.iterrows(): row['Day'] = 2

我们看到它正在成功更改副本(row变量)

We see that it is making the change successfully to the copy, the row variable(s):

In [22]: row Out[22]: Day 2.00 Temperature 35.31 Wind Speed 6.97 Year 2003.00 Name: 2003-03-01 10:00:00-05:00

Bu他们不会更新到DataFrame:

Bu they don't update to the DataFrame:

In [23]: df Out[23]: Day Temperature Wind Speed Year 2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 2003-03-01 10:00:00-05:00 1 35.31 6.97 2003

以下内容也保持df不变:

In [24]: row = df.ix[0] # also a copy In [25]: row['Day'] = 2

如果我们确实采取了视图 :(我们会看到更改 df.)

Whereas if we do take a view: (we'll see a change df.)

In [26]: row = df.ix[2:3] # this one's a view In [27]: row['Day'] = 3

请参见返回视图与副本(在文档).

这篇关于为什么此功能不“采用"?在遍历pandas DataFrame之后?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn't this function "take" after I iterrows over a pandas DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn&#39;t this function &quot;take&quot; after I iterrows over a pandas DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn't this function "take" after I iterrows over a pandas DataFrame?

登录关闭