更新 pandas 的 iterrow 值 [英] Updating value in iterrow for pandas
问题描述
我正在做一些地理编码工作,我使用 selenium
来屏幕抓取我需要的位置地址的 xy 坐标,我将一个 xls 文件导入到熊猫数据帧并想使用显式循环来更新没有 xy 坐标的行,如下所示:
I am doing some geocoding work that I used selenium
to screen scrape the x-y coordinate I need for address of a location, I imported an xls file to panda dataframe and want to use explicit loop to update the rows which do not have the x-y coordinate, like below:
for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
row.wgs1984_latitude = dict_temp['lat']
row.wgs1984_longitude = dict_temp['long']
我已阅读 为什么不'不是这个功能采取"吗?在我遍历 Pandas DataFrame 之后? 并且我完全意识到 iterrow 只给我们一个视图而不是一个用于编辑的副本,但是如果我真的要逐行更新值怎么办?lambda
可行吗?
I have read Why doesn't this function "take" after I iterrows over a pandas DataFrame? and am fully aware that iterrow only gives us a view rather than a copy for editing, but what if I really to update the value row by row? Is lambda
feasible?
推荐答案
您从 iterrows
返回的行是不再连接到原始数据框的副本,因此编辑不会更改你的数据框.值得庆幸的是,因为您从 iterrows
返回的每个项目都包含当前索引,您可以使用它来访问和编辑数据帧的相关行:
The rows you get back from iterrows
are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows
contains the current index, you can use that to access and edit the relevant row of the dataframe:
for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']
根据我的经验,这种方法似乎比使用 apply
或 map
之类的方法慢,但与往常一样,由您决定如何提高性能/易于编码权衡.
In my experience, this approach seems slower than using an approach like apply
or map
, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.
这篇关于更新 pandas 的 iterrow 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!