慢循环python在python的另一个数据框中搜索数据 [英] Slow loop python to search data in antoher data frame in python

查看:82
本文介绍了慢循环python在python的另一个数据框中搜索数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:一个包含我的所有数据(称为数据"),一个包含每个观测站开始和结束的不同站点的经度和纬度(称为信息"),我正在尝试获取一个数据框在每次观测中,每个站旁边都有纬度和经度,这是我在python中的代码:

I have two data frames : one with all my data (called 'data') and one with latitudes and longitudes of different stations where each observation starts and ends (called 'info'), I am trying to get a data frame where I'll have the latitude and longitude next to each station in each observation, my code in python :

for i in range(0,15557580):
    for j in range(0,542):
         if data.year[i] == '2018' and data.station[i]==info.station[j]:
             data.latitude[i] = info.latitude[j]
             data.longitude[i] = info.longitude[j]
             break

但是由于我有大约1500万次观察,所以这样做需要很多时间,是否有更快的方法呢?

but since I have about 15 million observation , doing it, takes a lot of time, is there a quicker way of doing it ?

非常感谢(我对此仍然很陌生)

Thank you very much (I am still new to this)

我的文件信息如下(大约500次观察,每个站点一次)

my file info looks like this (about 500 observation, one for each station)

我的文件数据(其他未在此处显示的变量)(大约1500万次观察,每次旅行一次)

my file data like this (theres other variables not shown here) (about 15 million observations , one for each travel)

我想要得到的是,当站点编号匹配时,结果数据将如下所示:

and what i am looking to get is that when the stations numbers match that the resulting data would look like this :

推荐答案

这是一种解决方案.您还可以使用pandas.mergedata添加2个新列,并执行等效的映射.

This is one solution. You can also use pandas.merge to add 2 new columns to data and perform the equivalent mapping.

# create series mappings from info
s_lat = info.set_index('station')['latitude']
s_lon = info.set_index('station')['latitude']

# calculate Boolean mask on year
mask = data['year'] == '2018'

# apply mappings, if no map found use fillna to retrieve original data
data.loc[mask, 'latitude'] = data.loc[mask, 'station'].map(s_lat)\
                                 .fillna(data.loc[mask, 'latitude'])

data.loc[mask, 'longitude'] = data.loc[mask, 'station'].map(s_lon)\
                                  .fillna(data.loc[mask, 'longitude'])

这篇关于慢循环python在python的另一个数据框中搜索数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆