滚动两列两行的Pandas Dataframe [英] Pandas Dataframe rolling with two columns and two rows

查看:127
本文介绍了滚动两列两行的Pandas Dataframe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框,其中包含经度和纬度坐标:

I got a dataframe with two columns that are holding Longitude and Latitude coordinates:

将熊猫作为pd导入

values = {'Latitude': {0: 47.021503365600005,
  1: 47.021503365600005,
  2: 47.021503365600005,
  3: 47.021503365600005,
  4: 47.021503365600005,
  5: 47.021503365600005},
 'Longitude': {0: 15.481974060399999,
  1: 15.481974060399999,
  2: 15.481974060399999,
  3: 15.481974060399999,
  4: 15.481974060399999,
  5: 15.481974060399999}}

df = pd.DataFrame(values)
df.head()

现在,我想在数据框上应用滚动窗口函数,该函数采用一行和另一行(窗口大小2)的经度和纬度(两列)来计算正弦距离.

Now I want to apply a rolling window function on the dataframe that takes the Longitude AND Latitude (two columns) of one row and another row (window size 2) in order to calculate the haversine distance.

def haversine_distance(x):
    print (x)

df.rolling(2, axis=1).apply(haversine_distance)

我的问题是,我从未获得全部四个值Lng1,Lat1(第一行)和Lng2,Lat2(第二行).如果我使用axis = 1,则将获得第一行的Lng1和Lat1.如果我使用axis = 0,那么我将获得第一行和第二行的Lng1和Lng2,但仅限经度.

My problem is that I never get all four values Lng1, Lat1 (first row) and Lng2, Lat2 (second row). If I use axis=1, then I will get Lng1 and Lat1 of the first row. If I use axis=0, then I will get Lng1 and Lng2 of the first and second row, but Longitude only.

如何使用两行两列应用滚动窗口?像这样:

How can I apply a rolling window using two rows and two columns? Somewhat like this:

def haversine_distance(x):
    row1 = x[0]
    row2 = x[1]
    lng1, lat1 = row1['Longitude'], row1['Latitude']
    lng2, lat2 = row2['Longitude'], row2['Latitude']
    # do your stuff here
    return 1

目前,我正在通过shift(-1)将数据框与其自身相连,从而在一行中生成所有四个坐标,从而进行了此计算.但是滚动也应该是可能的.另一种选择是将Lng和Lat合并为一列,并在其上应用axis = 0的滚动.但是必须有一种更简单的方法,对吧?

Currently I'm doing this calculation by joining the dataframe with itself by shift(-1) resulting in all four coordinates in one line. But it should be possible with rolling as well. Another option is combining Lng and Lat into one column and apply rolling with axis=0 onto that. But there must be an easier way, right?

推荐答案

Since pandas v0.23 it is now possible to pass a Series instead of a ndarray to Rolling.apply(). Just set raw=False.

原始:布尔值,默认为无

raw : bool, default None

False:将每个行或列作为系列传递给函数.

False : passes each row or column as a Series to the function.

TrueNone:传递的函数将改为接收ndarray对象.如果您仅应用NumPy缩减功能,则将获得更好的性能. raw参数是必需的,如果未传递,则将显示FutureWarning.将来raw会默认为False.

True or None : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance. The raw parameter is required and will show a FutureWarning if not passed. In the future raw will default to False.

0.23.0版中的新功能.

因此,在给定的示例上,您可以将纬度移至索引,然后将整个经度序列(包括索引)传递给函数:

So building on your given example, you could move the latitude to the index and pass the whole longitude series---including the index---to your function:

df = df.set_index('Latitude')
df['Distance'] = df['Longitude'].rolling(2).apply(haversine_distance, raw=False)

这篇关于滚动两列两行的Pandas Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆