更新 pandas 的价值 [英] Updating value in iterrow for pandas
问题描述
我正在做一些地理编码工作,我用selenium
来筛选刮取位置地址所需的xy坐标,我将xls文件导入了panda数据框,并希望使用显式循环来更新不需要的行具有xy坐标,如下所示:
I am doing some geocoding work that I used selenium
to screen scrape the x-y coordinate I need for address of a location, I imported an xls file to panda dataframe and want to use explicit loop to update the rows which do not have the x-y coordinate, like below:
for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
row.wgs1984_latitude = dict_temp['lat']
row.wgs1984_longitude = dict_temp['long']
我已阅读为什么没有此功能不是采取"功能吗?在我遍历pandas DataFrame之后?,并且完全意识到iterrow仅给我们一个视图而不是一个副本供编辑,但是如果我真的逐行更新值怎么办? lambda
可行吗?
I have read Why doesn't this function "take" after I iterrows over a pandas DataFrame? and am fully aware that iterrow only gives us a view rather than a copy for editing, but what if I really to update the value row by row? Is lambda
feasible?
推荐答案
从iterrows
返回的行是不再与原始数据框连接的副本,因此编辑不会更改您的数据框.幸运的是,由于您从iterrows
取回的每个项目都包含当前索引,因此您可以使用该索引来访问和编辑数据框的相关行:
The rows you get back from iterrows
are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows
contains the current index, you can use that to access and edit the relevant row of the dataframe:
for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']
根据我的经验,这种方法似乎比使用apply
或map
之类的方法要慢,但是与往常一样,由您自己决定如何进行性能/简便的编码折衷.
In my experience, this approach seems slower than using an approach like apply
or map
, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.
这篇关于更新 pandas 的价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!