如何根据距已知参考轨迹的距离过滤出位置数据? [英] How to filter out positional data based on distance from a known reference trajectory?

查看：38 发布时间：2021/4/28 20:45:02 python pandas gps data-science data-cleaning

本文介绍了如何根据距已知参考轨迹的距离过滤出位置数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个需要过滤的87288点数据集.数据集的过滤字段是X位置和Y位置(以经度和纬度表示).绘制数据如下所示:

I have a 87288-point dataset that I need to filter. The filtering fields for the dataset are a X position and a Y position, as latitude and longitude. Plotted the data looks like this:

问题是，我只需要沿某个已知路径的数据即可.像这样:

The problem is , I only need data along a certain path, which is known in advance. Something like this:

我已经知道如何在Pandas DF中过滤数据，但是鉴于路径不是线性的，我需要一种有效的策略来以一定的精度清除所有嘈杂的数据(由于数据集非常大，需要手动进行选点不是一种选择.)

I already know how to filter data in a Pandas DF, but given the path is not linear, I need an effective strategy to clear out all the noisy data with a certain degree of precision (since the dataset is so large, manually picking the points is not an option).

这是一些示例数据.唯一重要的列分别是纬度"和经度"，分别是Y和X.

Here is some sample data.The only important columns are Latitude and Longitude, Y and X respectively.

Sesion,Tiempo,Latitud,Longitud,PM2.5,Modo,Hora,DiaSemana
M-O-AM-07OCT19-DMR,2019-10-01 09:48:17.625,3.3659550000000005,-76.5288288,13.0,OUTDOOR,AM,1
M-O-AM-07OCT19-DMR,2019-10-07 08:18:03.555,3.3661757000000003,-76.5289441,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:04.596,3.3661757000000003,-76.5289441,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:05.572,3.3661767,-76.5289375,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:06.614,3.3661790999999996,-76.5289188,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:07.581,3.3661814,-76.5289024,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:08.588,3.3661847999999996,-76.52889820000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:09.570,3.3661922,-76.52890450000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:10.579,3.3661922,-76.52890450000001,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:11.577,3.3662135,-76.52893370000001,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:12.611,3.3662227999999996,-76.5289516,12.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:13.561,3.3662227999999996,-76.5289516,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:14.631,3.3662346,-76.5289927,11.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:15.554,3.3662421,-76.52901440000001,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:16.623,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:17.593,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:18.617,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:19.608,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:20.605,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:21.594,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:22.608,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:23.620,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:24.611,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:25.622,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:26.590,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:27.619,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:28.595,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:29.628,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0
M-O-AM-07OCT19-DMR,2019-10-07 08:18:30.621,3.3662523999999996,-76.5290363,10.0,OUTDOOR,AM,0

我已经尝试过手动选择路线内的一些点，并使用固定的最小距离过滤其余点，例如这样.

I have tried of handpicking a few points inside the route, and filtering the rest using a fixed min distance, something like this.

import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
from cycler import cycler
import numpy as np
from salem import get_demo_file, DataLevels, GoogleVisibleMap, Map
import geopy.distance

def get_dist(coords_1 , coords_2):
    return geopy.distance.distance(coords_1, coords_2).meters

dists=[
    (-76.5297163,3.3665631),
    (-76.5307019,3.3656924),
    (-76.5314718,3.3646900),
    (-76.5319956,3.3638394),
    (-76.5316622,3.3621781),
    (-76.5311999,3.3611796),
    (-76.5308636,3.3599338),
    (-76.5306335,3.3585191),
    (-76.5304758,3.3577502),
    (-76.5303957,3.3561101),
    (-76.5302998,3.3543178),
    (-76.5302220,3.3531897),
    (-76.5302369,3.3515283),
    (-76.5303363,3.3502667),
    (-76.5305351,3.3485951),
    (-76.5306779,3.3475220),
    (-76.5308545,3.3456382),
    (-76.5307738,3.3446934),
    (-76.530618,3.3430422)
]
df = pd.read_csv('movil.csv')


for index, row in df.iterrows():
    if index%1000 ==0:
        print(index)
    mind=None
    for i in dists:
        if mind:
            d=get_dist((row['Latitud'],row['Longitud']),(i[1],i[0]))
            if d<mind:
                mind=d
        else:
            mind=get_dist((row['Latitud'],row['Longitud']),(i[1],i[0]))
    if mind>125:
        df.drop(index, inplace=True)

print(df)

使用这些方法，我设法进行了一些清理，但是我觉得很多有用的数据都已被过滤掉.

Using these approach I managed to get some cleaning, but I feel a lot of useful data is getting filtered.

推荐答案

让我们从一些示例数据开始.请注意，纬度和经度以度进行记录和生成，但以弧度进行计算.

Let's start with some sample data. Note that latitude and longitude are recorded in degrees for generation and plotting, but in radians for computation.

import numpy
import pandas

def add_radians(df):
    return df.assign(**{colname.rstrip("_deg"): numpy.radians(col) for colname, col in df.iteritems()})

n_ref = 26
ref_traj = pandas.DataFrame({"lat_deg": -76 + numpy.linspace(-1, 1, n_ref),
                             "lon_deg":   3 + numpy.linspace(-1, 1, n_ref)**2,
                            }).pipe(add_radians)

n = 500
traj = pandas.DataFrame({"lat_deg": -76 + numpy.cumsum(numpy.random.choice([-1, 1], size=n) * 0.05),
                         "lon_deg":   3 + numpy.cumsum(numpy.random.choice([-1, 1], size=n) * 0.05),
                        }).pipe(add_radians)

ax = traj.plot.scatter(x="lat_deg", y="lon_deg")
ax = ref_traj.plot.scatter(x="lat_deg", y="lon_deg", color="red", ax=ax)

接下来，我们可以定义一个向量化函数，该函数返回两点之间的距离.这应该适用于一维或二维数组.

Next, we can define a vectorized function returning the distance between two points. This should work for 1- or 2-dimensional arrays.

def distance(lat1, lon1, lat2, lon2):
    # TODO: check that shape of lat1, lon1, lat2, lon2 are all compatible.
    R = 6371  # Radius of Earth in kilometers

    # TODO: check this distance calculation

    def hav(theta):
        return numpy.sin(theta)**2

    d_lat = lat2 - lat1
    d_lon = lon2 - lon1
    a = hav(d_lat / 2) + numpy.cos(lat1) * numpy.cos(lat2) * hav(d_lon / 2)
    return 2 * R * numpy.sqrt(a)

然后，我们可以尝试找到从每个轨迹点到任何参考轨迹点的最小距离.这在计算上很昂贵，为 O(N * M)，但是我们可以通过将参考点和轨迹点广播到二维数组中来对其向量化.

Then, we can attempt to find the minimum distance from each trajectory point to any reference trajectory point. This is computationally expensive, at O(N*M), but we can vectorize it by broadcasting the reference points and trajectory points into 2-D arrays.

def min_distance(ref_lat, ref_lon, lat, lon):
    shape = (numpy.shape(lat)[0], numpy.shape(ref_lat)[0])

    def broadcasted(a):
        return numpy.broadcast_to(a, shape=shape)

    d = distance(lat1=broadcasted(ref_lat), 
                 lon1=broadcasted(ref_lon), 
                 lat2=broadcasted(lat[:, numpy.newaxis]),
                 lon2=broadcasted(lon[:, numpy.newaxis]))
    return numpy.amin(d, axis=-1)

最后，我们可以选择一个公差，并选择最小距离小于公差的过滤点.

Finally, we can choose a tolerance and filter points that have a minimum distance less than the tolerance.

d = min_distance(ref_traj['lat'], ref_traj['lon'], traj['lat'], traj['lon'])
tolerance = 10  # in kilometers
near_ref = d < tolerance

最后，我们可以使用布尔值 near_ref 掩码来过滤 traj 数据帧:

Finally, we can use the boolean near_ref mask to filter the traj dataframe:

ax = ref_traj.plot.scatter(x="lat_deg", y="lon_deg", color="red")
traj[near_ref].plot.scatter(x="lat_deg", y="lon_deg", color="blue", ax=ax)
traj[~near_ref].plot.scatter(x="lat_deg", y="lon_deg", color="gray", ax=ax)

这篇关于如何根据距已知参考轨迹的距离过滤出位置数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何根据距已知参考轨迹的距离过滤出位置数据? [英] How to filter out positional data based on distance from a known reference trajectory?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何根据距已知参考轨迹的距离过滤出位置数据? [英] How to filter out positional data based on distance from a known reference trajectory?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭