为什么此功能不“采用"?在遍历pandas DataFrame之后? [英] Why doesn't this function "take" after I iterrows over a pandas DataFrame?
问题描述
我有一个带有时间戳的温度和风速值的DataFrame,以及一个将其转换为风寒"的函数.我正在使用iterrows在每一行上运行该函数,并希望通过一个漂亮的"Wind Chill"列来获取DataFrame.
I have a DataFrame with timestamped temperature and wind speed values, and a function to convert those into a "wind chill." I'm using iterrows to run the function on each row, and hoping to get a DataFrame out with a nifty "Wind Chill" column.
但是,尽管它似乎可以正常运行,并且实际上至少已经工作"了一次,但我似乎无法始终如一地复制它.总的来说,我感觉这是我缺少的有关DataFrames结构的东西,但我希望有人能提供帮助.
However, while it seems to work as it's going through, and has actually "worked" at least once, I can't seem to replicate it consistently. I feel like it's something I'm missing about the structure of DataFrames, in general, but I'm hoping someone can help.
In [28]: bigdf.head()
Out[28]:
Day Temperature Wind Speed Year
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003
因此,我将bigdf
,并预先填充NaN
.
So I add a 'Wind Chill' column to bigdf
and prepopulate with NaN
.
In [29]: bigdf['Wind Chill'] = NaN
然后,我尝试遍历各行,以添加实际的Wind Chills.
Then I try to iterate over the rows, to add the actual Wind Chills.
In [30]: for row_index, row in bigdf[:5].iterrows():
...: row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed'])
...: print row['Wind Chill']
...:
24.7945889994
25.1365267133
25.934114012
28.2194307516
29.5051046953
您可以说,出现了新值 ,该值将应用于"Wind Chill"列.这是windchill
函数,以防万一:
As you can say, the new values appear to be applied to the 'Wind Chill' column. Here's the windchill
function, just in case that helps:
def windchill(temp, wind):
if temp>50 or wind<=3:
return temp
else:
return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16
但是,当我再次查看DataFrame时,NaN仍然存在:
But, when I look at the DataFrame again, the NaN's are still there:
In [31]: bigdf.head()
Out[31]:
Day Temperature Wind Speed Year Wind Chill
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 NaN
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 NaN
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 NaN
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 NaN
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003 NaN
甚至更奇怪的是,它已经工作了一次或两次,但我不能说我做了什么不同的事情.
What's even weirder is that it has worked once or twice, and I can't tell what I did differently.
我必须承认我对熊猫的内部运作不是特别熟悉,并且对索引等感到困惑,所以我觉得我可能在这里缺少了一些非常基本的东西(或者很难做到).
I must admit I'm not especially familiar with the inner workings of pandas, and get confused with indexing, etc., so I feel like I'm probably missing something very basic here (or doing this the hard way).
谢谢!
推荐答案
You can use apply
to do this:
In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
axis=1)
Out[11]:
2003-03-01 06:00:00-05:00 24.794589
2003-03-01 07:00:00-05:00 25.136527
2003-03-01 08:00:00-05:00 25.934114
2003-03-01 09:00:00-05:00 28.219431
2003-03-01 10:00:00-05:00 29.505105
In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
axis=1)
In [13]: df
Out[13]:
Day Temperature Wind Speed Year Wind Chill
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 24.794589
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 25.136527
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 25.934114
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 28.219431
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003 29.505105
.
To expand on the reason for your confusion, I think it stems from the fact that the row variables are copies rather than views of the df, so changes don't propagate:
In [21]: for _, row in df.iterrows(): row['Day'] = 2
我们看到它正在成功更改副本(row
变量)
We see that it is making the change successfully to the copy, the row
variable(s):
In [22]: row
Out[22]:
Day 2.00
Temperature 35.31
Wind Speed 6.97
Year 2003.00
Name: 2003-03-01 10:00:00-05:00
Bu他们不会更新到DataFrame:
Bu they don't update to the DataFrame:
In [23]: df
Out[23]:
Day Temperature Wind Speed Year
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003
以下内容也保持df
不变:
In [24]: row = df.ix[0] # also a copy
In [25]: row['Day'] = 2
如果我们确实采取了视图 :(我们会看到更改 df
.)
Whereas if we do take a view: (we'll see a change df
.)
In [26]: row = df.ix[2:3] # this one's a view
In [27]: row['Day'] = 3
请参见返回视图与副本(在文档).
这篇关于为什么此功能不“采用"?在遍历pandas DataFrame之后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!