如何在数据集中填写缺失的地理位置? [英] How to fill missing geo location in datasets?

查看:276
本文介绍了如何在数据集中填写缺失的地理位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组缺少地理位置名称和坐标的数据集。我想填补空白,以便我可以继续对数据进行未来分析。数据集是从Twitter收集的,因此它不是一个创建的数据,但是这就是数据来了,我需要以某种方式填补空白,并继续进行未来的分析。

选项1:我可以使用 userLocation userTimezone 坐标



输入:

  userLocation,userTimezone,Coordinates,
印度,夏威夷,{u'type':u'Point',u'coordinates':[73.8567,18.5203​​]}
美国加利福尼亚
,新德里,
英尺。山姆休斯敦山区时间(美国和加拿大),{u'type':u'Point',u'coordinates':[86.99643,23.68088]}
加德满都,尼泊尔,加德满都,{u'type': u'Point',u'coordinates':[85.3248024,27.69765658]}

预期输出

  userLocation,userTimezone,Coordinates_one,Coordinates_two 
印度,夏威夷,73.8567,18.5203​​
美国加利福尼亚,[填入此项] [填写此]
[填写此],新德里,[填写此] [填写此]
Ft。 Sam Houston,Mountain Time(美国和加拿大),86.99643,23.68088
加德满都,加德满都,85.3248024,27.69765658

是否有可能在Python或熊猫中编写脚本来同时填写缺失的位置名称和坐标,并正确地格式化输出?



I理解Python或Pandas没有任何魔术包,但是一开始就会有所帮助。



我在 GIS 部分,但在那里没有太多的帮助。这是我第一次使用地理位置数据集,我不知道如何开始。如果问题不适合,请评论删除它而不是投票。 /如何填补缺少地理位置在数据集问题,有没有不可思议的方式来产生准确的东西,但我会玩弄 geopy 。我假设你能够遍历你遗漏的数据,示例代码和输出展示geopy:来自geopy.geocoders的

  import Nominatim 

geolocator = Nominatim()

位于('California USA','New Delhi')中:
geoloc = geolocator.geocode(位置)
打印位置,':',geoloc,geoloc.latitude,geoloc.longitude

输出:美国加利福尼亚州:加利福尼亚州,美国36.7014631 -118.7559974
新德里:新德里,新德里区,新德里,印度28.6138967 77.2159562

您可能想尝试不同的地理编码服务(请参阅 geopy doc ),其中一些服务可以采用额外的参数,例如提名可以采用country_bias这个关键字,它会将结果偏向给定国家。


I have a set of dataset with missing geo location names and coordinates at same time. I want to fill in the gaps so that I can proceed with the future analysis of the data. The data set is harvested from twitter so it is not a created data but this is how the data has come and I need to fill in the gaps somehow and continue with future analysis.

Option 1: I can use either of the userLocation and userTimezone to find the coordinates

Input:

userLocation,   userTimezone,   Coordinates,
India,          Hawaii,    {u'type': u'Point', u'coordinates': [73.8567, 18.5203]}
California,     USA     
          ,     New Delhi,  
Ft. Sam Houston,Mountain Time (US & Canada),{u'type': u'Point', u'coordinates': [86.99643, 23.68088]}
Kathmandu,Nepal, Kathmandu, {u'type': u'Point', u'coordinates': [85.3248024, 27.69765658]}

Expected Output

userLocation,  userTimezone,   Coordinates_one, Coordinates_two
    India,          Hawaii,         73.8567,         18.5203
    California,     USA,            [fill this]      [fill this]
    [Fill this],    New Delhi,      [fill this]      [fill this]
    Ft. Sam Houston,Mountain Time (US & Canada), 86.99643, 23.68088
    Kathmandu,      Kathmandu,      85.3248024,      27.69765658

Is it possible to write a script in Python or pandas to fill in the missing location names and coordinates at same time with formatting the output properly?

I understand Python or Pandas does not have any magic package but something to start with would be helpful.

I have asked this question on GIS section but no much help over there. This is the first time ever I am working with Geo location data set and I have no clue how to start with. If the question is not suitable then please comment to delete it instead of down voting.

解决方案

As others have mentioned on the your GIS question, there is no magical way to produce something accurate, but I would play around with geopy. I assume you are able to loop over your missing data, example code and output demonstrating geopy:

from geopy.geocoders import Nominatim

geolocator = Nominatim() 

for location in ('California USA', 'New Delhi'):
    geoloc = geolocator.geocode(location)
    print location, ':', geoloc, geoloc.latitude, geoloc.longitude

Output:

California USA : California, United States of America 36.7014631 -118.7559974 
New Delhi : New Delhi, New Delhi District, Delhi, India 28.6138967 77.2159562

You may want to try different geocoded services (see the geopy doc), some of these service can take additional arguments, e.g. nomination can take the "country_bias" keyword which will bias results to the given country.

这篇关于如何在数据集中填写缺失的地理位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆