Python:应用定义的正弦函数时 groupby() 和 apply() 出现问题 [英] Python: Issue with groupby() and apply() when applying defined haversine function

查看:17
本文介绍了Python:应用定义的正弦函数时 groupby() 和 apply() 出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过定义的半正弦函数计算以下数据集的距离.该函数适用于其他数据.然而,在这个特定的数据集中,我试图 groupby(df.index) 并且它给了我错误:

I am trying to compute the distance of the following dataset through a defined haversine function. The function works well on other data. However, in this particular dataset, I am trying to groupby(df.index) and it gives me the error:

无法将系列转换为

我之前使用过 groupby() 和 apply() 并且没有问题.我无法理解在这种情况下发生了什么以及如何解决它.

I've used groupby() and apply() before and there were no problems. I can't understand what's happening in this case and how I could fix it.

这是数据

                                            latitude    longitude   datetime
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925191   2021-06-13 14:22:11.682
356a192b7913b04c54574d18c28d46e6395428ab    57.723614   11.925187   2021-06-13 14:22:13.562
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925172   2021-06-13 14:22:28.635
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723637   11.925056   2021-06-13 14:22:59.336
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.724075   11.923708   2021-06-13 14:23:44.905
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723610   11.925191   2021-06-13 14:22:04.000
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723614   11.925178   2021-06-13 14:22:44.170
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723827   11.924635   2021-06-13 14:23:14.479
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723866   11.924005   2021-06-13 14:23:29.605

代码如下:


df2 = pd.concat([df.add_suffix('_pre').shift(), trips], axis=1)
df2

>>

                                           latitude_pre longitude_pre   datetime_pre    latitude    longitude   datetime
356a192b7913b04c54574d18c28d46e6395428ab            NaN         NaN                  NaT    57.723610   11.925191   2021-06-13 14:22:11.682
356a192b7913b04c54574d18c28d46e6395428ab    57.723610   11.925191   2021-06-13 14:22:11.682 57.723614   11.925187   2021-06-13 14:22:13.562
356a192b7913b04c54574d18c28d46e6395428ab    57.723614   11.925187   2021-06-13 14:22:13.562 57.723610   11.925172   2021-06-13 14:22:28.635
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723610   11.925172   2021-06-13 14:22:28.635 57.723637   11.925056   2021-06-13 14:22:59.336
da4b9237bacccdf19c0760cab7aec4a8359010b0    57.723637   11.925056   2021-06-13 14:22:59.336 57.724075   11.923708   2021-06-13 14:23:44.905
77de68daecd823babbb58edb1c8e14d7106e83bb    57.724075   11.923708   2021-06-13 14:23:44.905 57.723610   11.925191   2021-06-13 14:22:04.000
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723610   11.925191   2021-06-13 14:22:04.000 57.723614   11.925178   2021-06-13 14:22:44.170
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723614   11.925178   2021-06-13 14:22:44.170 57.723827   11.924635   2021-06-13 14:23:14.479
77de68daecd823babbb58edb1c8e14d7106e83bb    57.723827   11.924635   2021-06-13 14:23:14.479 57.723866   11.924005   2021-06-13 14:23:29.605


df2.groupby(df2.index).apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']))

>>
cannot convert the series to <class 'float'>

如果需要,这里是haversine():

In case it is needed, here is haversine():

def haversine(lat1, lon1, lat2, lon2):
    R = 6373.0 * 1000 # Earth's radius (in m)
    
    dlon = radians(lon2) - radians(lon1)
    dlat = radians(lat2) - radians(lat1)
    
    a = math.sin(dlat / 2)**2 + math.cos(radians(lat1)) * math.cos(radians(lat2)) * math.sin(dlon / 2)**2
    return R *2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

需要_pre列的原因是因为我正在迭代相同列的点坐标.应用移位是因为第一个点坐标没有前一个点来计算距离.

The reason why the _pre columns are needed is because I am iterating over the point coordinates of the same columns. A shift is applied as the first point coordinate doesn't have a previous point to compute the distance from.

我尝试将日期时间列从日期时间转换为纪元,但错误仍然存​​在.目前,所有列都是浮点型.

I attempted to convert the datetime column from datetime into epoch and the error persists. Currently, all the columns are of type float.

要将其转换为我使用的纪元:

To convert it into epoch I used:

import datetime as dt

df['datetime'] = (df['datetime'] - dt.datetime(1970,1,1)).dt.total_seconds()

也试过了:

shift(fill_value=0)

并得到同样的错误

推荐答案

如果你将 print(lat1) 添加到你的半正弦函数,你会得到这个打印:

If you add print(lat1) to your haversine function, you get this printed:

356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64
356a192b7913b04c54574d18c28d46e6395428ab          NaN
356a192b7913b04c54574d18c28d46e6395428ab    57.723610
356a192b7913b04c54574d18c28d46e6395428ab    57.723614
Name: latitude_pre, dtype: float64

lat1 的值"是一个系列而不是单个值.那是你要的吗?不清楚这是否是您想要的,但我认为存在错误,因为它正在寻找单个值.

The 'value' of lat1 is a series not a single value. Is that what you want? It's not clear that is what you want, but the error I believe is there because it's looking for single value.

这篇关于Python:应用定义的正弦函数时 groupby() 和 apply() 出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆