有没有更慢或更受控制的.apply()替代方法? [英] Is there a slower or more controlled alternative to .apply()?

查看:81
本文介绍了有没有更慢或更受控制的.apply()替代方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以这似乎是一个奇怪的问题,但是我有一个熊猫DataFrame,里面有地址,我想对其进行地理编码,以便获得纬度和经度.

So this may seem like an odd question, but I have a pandas DataFrame with addresses in it, that I want to geocode so I can get the latitude and longitude.

由于这个非常有用的线程(使用geopy熊猫使用坐标创建新列),但是我的问题是所有开放的API都严格限制了它们每秒允许的请求数量以及每天的请求数量.

I have code that works using .apply() thanks to this very helpful thread (new column with coordinates using geopy pandas), but my problem is that all of the open APIs have strict limits to how many requests per second they allow, and also requests per day.

我无法找到任何方法来限制我的代码,因此请匹配API的限制.我的DF有25K行,但是如果我创建它的子集(最多5行),则只能成功进行地址解析.

I haven't been able to find any way to throttle my code so match the limits of the APIs. My DF has 25K rows, but I've only been able to successfully geocode if I create a subset of it with up to 5 rows.

我对python和pandas没有太多的经验,但是在SAS中,DATA步骤一次迭代一行,因此我可以使用sleep命令来限制请求.用python/pandas实现类似的最佳方法是什么?

I don't have a lot of experience with python and pandas, but in SAS the DATA steps iterate one row at a time, so I could have a sleep command that would throttle the requests. What would be the best way to implement something like that with python/pandas?

因此,根据到目前为止的答案,我想确认一下,我的代码将从以下内容更改: df_small['city_coord'] = df_small['Address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
到:

So based on the answers so far, I wanted to confirm, my code would change from: df_small['city_coord'] = df_small['Address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
to:

df_small = df_clean[:5]
def f(x, delay=1):
# run your code    
sleep(delay)
return geolocator.geocode(x)

df_small['city_coord'] = df_small['Address'].apply(f).apply(lambda x: (x.latitude, x.longitude))

推荐答案

要进行延迟迭代,可以使用 time.sleep() :

To iterate with a delay, you can use df.iterrows() and time.sleep():

from time import sleep

for row in df.iterrows():
    # run your code
    sleep(1) # how many seconds to wait

或者您也可以将time.sleep()放在apply函数本身中(如@RafaelC在注释中建议的那样):

Or you can just put time.sleep() within the apply function itself (as @RafaelC suggests in the comments):

def f(x, delay=1):
    # run your code
    sleep(delay)

df.apply(f)

这篇关于有没有更慢或更受控制的.apply()替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆