使用Pandas从导入的csv计算坐标之间的距离 [英] Using Pandas to calculate distance between coordinates from imported csv

查看:84
本文介绍了使用Pandas从导入的csv计算坐标之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试导入一个包含两列位置数据(经/纬度)的.csv,计算点之间的距离,将该距离写入新列,将该函数循环到下一组坐标,然后写入输出数据帧到新的.csv.我已经编写了以下代码

 将pandas导入为pd将numpy导入为nppd.read_csv("input.csv")def dist_from_coordinates(lat1,lon1,lat2,lon2):R = 6371#以km为单位的地球半径#转换为弧度d_lat = np.radians(lat2-lat1)d_lon = np.radians(lon2-lon1)r_lat1 = np.radians(lat1)r_lat2 = np.radians(lat2)#haversine公式a = np.sin(d_lat/2.)** 2 + np.cos(r_lat1)* np.cos(r_lat2)* np.sin(d_lon/2.)** 2Haversine = 2 * R * np.arcsin(np.sqrt(a))回归haversinelat1 = row ['lat1']#location.lat列的第一行lon1 = row ['lon1']#location.long第一列在这里lat2 = row ['lat2']#第二行location.lat列在这里lon2 = row ['lon2']#第二行location.long列在这里打印(dist_from_coordinates(lat1,lon1,lat2,lon2),'km')df.to_csv('output.csv') 

我收到以下错误:追溯(最近一次呼叫过去):在第22行的文件"Test.py"中lat1 = row ['lat1']#location.lat列的第一行NameError:名称"row"未定义

是否可以提供有关如何通过此数据成功循环此公式的其他反馈?

我假设您正在input.csv中使用4列,其中包含lat1,lon1,lat2和lon2的值.因此,执行完该操作后,output.csv文件是一个单独的文件,其中包含所有前面的4列以及作为距离的第5列.您可以使用for循环来执行此操作.我在这里显示的方法读取每一行并计算距离,并将其附加到一个空列表中,该列表是新列"Distance",并最终创建output.csv.进行必要的更改.请记住,这适用于具有多个坐标值的4列csv文件.希望这对您有帮助.祝你有美好的一天.

 将pandas导入为pd将numpy导入为npinput_file ="input.csv"output_file ="output.csv"df = pd.read_csv(input_file)#Dataframe规范df = df.convert_objects(convert_numeric = True)def dist_from_coordinates(lat1,lon1,lat2,lon2):R = 6371#以km为单位的地球半径#转换为弧度d_lat = np.radians(lat2-lat1)d_lon = np.radians(lon2-lon1)r_lat1 = np.radians(lat1)r_lat2 = np.radians(lat2)#haversine公式a = np.sin(d_lat/2.)** 2 + np.cos(r_lat1)* np.cos(r_lat2)* np.sin(d_lon/2.)** 2Haversine = 2 * R * np.arcsin(np.sqrt(a))回归haversinenew_column = [] #empty列的距离对于索引,在df.iterrows()中行:lat1 = row ['lat1']#location.lat列的第一行lon1 = row ['lon1']#location.long第一列在这里lat2 = row ['lat2']#location.lat列的第二行lon2 = row ['lon2']#第二行location.long列在这里值= dist_from_coordinates(lat1,lon1,lat2,lon2)#获取距离new_column.append(value)#在空白列表中添加距离值df.insert(4,"Distance",new_column)#4是您要放置列的索引.列索引以0开头."Distance"是标题,而new_column是列中的值.与open(output_file,'ab')为f:df.to_csv(f,index = False)#创建输出 

I am trying to import a .csv that contains two columns of location data (lat/long), compute the distance between points, write the distance to a new column, loop the function to the next set of coordinates, and write the output data frame to a new .csv. I have the following code written and it

import pandas as pd
import numpy as np
pd.read_csv("input.csv")

def dist_from_coordinates(lat1, lon1, lat2, lon2):
R = 6371  # Earth radius in km

#conversion to radians
d_lat = np.radians(lat2-lat1)
d_lon = np.radians(lon2-lon1)

r_lat1 = np.radians(lat1)
r_lat2 = np.radians(lat2)

#haversine formula
a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

haversine = 2 * R * np.arcsin(np.sqrt(a))

return haversine

lat1 = row['lat1'] #first row of location.lat column here
lon1 = row['lon1'] #first row of location.long column here
lat2 = row['lat2'] #second row of location.lat column here
lon2 = row['lon2'] #second row of location.long column here

print(dist_from_coordinates(lat1, lon1, lat2, lon2), 'km')

df.to_csv('output.csv')

I am receiving the following error: Traceback (most recent call last): File "Test.py", line 22, in lat1 = row['lat1'] #first row of location.lat column here NameError: name 'row' is not defined

Could additional feedback be provided on how to successfully loop this formula through this data?

解决方案

I assume that you are using 4 columns in your input.csv which contains the value of lat1,lon1,lat2 and lon2. So, after going through the operation, the output.csv file is a separate file which contains all the previous 4 columns as well as the 5th column which is the distance. You can use a for loop to do this. The method that I am showing here reads each row and calculates the distance and append it in an empty list which is the new column "Distance" and eventually creates output.csv. Make changes anywhere necessary. Remember that this works on 4 columns csv file with multiple coordinates value. Hope that this helps you. Have a great day.

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv

这篇关于使用Pandas从导入的csv计算坐标之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆