在python中使用bing或google API获取位置坐标 [英] Get location coordinates using bing or google API in python

查看:622
本文介绍了在python中使用bing或google API获取位置坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题。我有一个示例文本文件,通过抓取各种html页面来存储文本数据。此文本包含有关各种事件及其时间和位置的信息。我想获取这些位置的坐标。我不知道如何在Python中做到这一点。我正在使用nltk来识别此示例文本中的命名实体。这里是代码:

  import nltk 

with open('sample.txt','r ')as f:
sample = f.read()

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence)for sentence in sentence in]
tagged_sentences = [nltk.pos_tag(sentence)for tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences,binary = True)

#print chunked_sentences
#打印tokenized_sentences
#print tagged_sentences
$ b $ def extract_entity_names(t):
entity_names = []
$ b $如果hasattr(t,'node')和t .node:
如果t.node =='NE':
entity_names.append(''.join([child [0] for child]))
else:
for child in t:
entity_names.extend(extract_entity_names(child))

return entity_names

entity_names = []
for chunked_sentences :
#Pri nt每个句子的结果
#print extract_entity_names(tree)

entity_names.extend(extract_entity_names(tree))

#打印所有实体名称
#print entity_names

#打印唯一的实体名称
print set(entity_names)



<


考文特花园的Labohème



>时间:2013年1月18日(不同日期),下午7点30分地点:考文特花园,
伦敦,约翰科普利常年受欢迎的皇家歌剧院生产的
普契尼的LaBohème本赛季两次复活,
在圣诞节期间合适。马克埃尔德爵士将Rolando
Villazón作为Rodolfo和Maija Kovalevska作为Mimì进行管理。 Mimì遇见诗人
Rodolfo(Dmytro Popov在1月5日和18日演唱)在巴黎拉丁区的一场寒冷的
圣诞夜。蜡烛熄灭后,在黑暗中
摸索,他们坠入爱河。 Rodolfo住在其他三个小伙子里:哲学家Colline(Nahuel di Pierro / Jihoon Kim,1月18日
),音乐家Schaunard(David Bizic)和画家Marcello
(Audun Iversen),他们喜欢Musetta (Stefania Dovhan)。两对情侣
分手,歌剧在悲剧中结束,因为Rodolfo发现Mimì在一个冷冻的阁楼里死于
的消费。

我想从本文取得伦敦考文特花园的坐标。我真的有两个问题:



$ b

解决方案

ol>

  • 如何提取位置文本(或潜在位置文本)。
  • 如何通过调用具有位置文本的地理编码服务来获取位置(纬度,经度)。

  • 我可以帮助解决第二个问题。 (但请参阅下面的编辑,以获取对第一个问题的帮助。)



    使用旧版Google Maps API(仍在运行),您可以将地理编码降至1行(一个丑陋的行):

    $ $ p $ def geocode(地址):
    返回元组([float(s)for ('http://maps.google.com/maps/geo?'+ urllib.urlencode({'output':'csv','q':address})))[0] .split(',')[2:]])

    查看 Google Maps API地址解析文档



    这里是可读的7行版本加上一些包装代码(当从命令行调用时记得用引号将地址括起来):

      import sys 
    导入urllib

    googleGeocodeUrl ='http://maps.google.com/maps/geo?'

    def地理编码(地址):
    parms = {
    '输出':'csv',
    'q :地址}

    url = googleGeocodeUrl + urllib.urlencode(parms)
    resp = urllib.urlopen(url)
    resplist = list(resp)
    line = resplist [0]
    状态,精度,纬度,经度= line.split(',')
    返回纬度,经度

    def main():
    if 1 < len(sys.argv):
    address = sys.argv [1]
    else:
    address ='1600 Amphitheatre Parkway,Mountain View,CA 94043,USA'

    坐标= geocode(地址)
    打印坐标
    $ b $如果__name__ =='__main__':
    main()

    解析CSV格式很简单,但XML格式的错误报告更好。

    编辑 - 帮助你的第一个问题



    我查看了 nltk 。这不是微不足道的,但我可以推荐自然语言工具包文档,CH 7 - 从文本中提取信息,特别是, 7.5命名实体识别。在本节结尾处,他们指出:


    NLTK提供了一个分类器,该分类器已经过训练以识别命名实体,函数nltk.ne_chunk()。如果我们设置参数binary = True,那么命名实体就被标记为NE;否则,分类器会添加类别标签,例如PERSON,ORGANIZATION和GPE。

    您指定 True ,但您可能需要分类标签,例如:

      chunked_sentences = nltk.batch_ne_chunk(tagged_sentences) 

    这提供了类似标签(命名实体类型),看起来很有希望。但是,在对文字和几处简单的短语进行尝试之后,很明显需要更多规则。阅读文档获取更多信息。


    Here is my problem. I have a sample text file where I store the text data by crawling various html pages. This text contains information about various events and its time and location. I want to fetch the coordinates of these locations. I have no idea on how I can do that in python. I am using nltk to recognize named entities in this sample text. Here is the code:

    import nltk
    
    with open('sample.txt', 'r') as f:
        sample = f.read()
    
    sentences = nltk.sent_tokenize(sample)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
    
    #print chunked_sentences
    #print tokenized_sentences
    #print tagged_sentences
    
    def extract_entity_names(t):
        entity_names = []
    
        if hasattr(t, 'node') and t.node:
            if t.node == 'NE':
                entity_names.append(' '.join([child[0] for child in t]))
            else:
                for child in t:
                    entity_names.extend(extract_entity_names(child))
    
        return entity_names
    
    entity_names = []
    for tree in chunked_sentences:
        # Print results per sentence
        # print extract_entity_names(tree)
    
        entity_names.extend(extract_entity_names(tree))
    
    # Print all entity names
    #print entity_names
    
    # Print unique entity names
    print set(entity_names)
    

    Sample file is something like this:

    La bohème at Covent Garden

    When: 18 Jan 2013 (various dates) , 7.30pm Where: Covent Garden, London, John Copley's perennially popular Royal Opera production of Puccini's La bohème is revived for the first of two times this season, aptly over the Christmas period. Sir Mark Elder conducts Rolando Villazón as Rodolfo and Maija Kovalevska as Mimì. Mimì meets poet Rodolfo (Dmytro Popov sings the role on 5 and 18 January) one cold Christmas Eve in Paris' Latin Quarter. Fumbling around in the dark after her candle has gone out, they fall in love. Rodolfo lives with three other lads: philosopher Colline (Nahuel di Pierro/Jihoon Kim on 18 January), musician Schaunard (David Bizic) and painter Marcello (Audun Iversen), who loves Musetta (Stefania Dovhan). Both couples break up and the opera ends in tragedy as Rodolfo finds Mimì dying of consumption in a freezing garret.

    I want to fetch coordinates for Covent Garden,London from this text. How can I do it ?

    解决方案

    You really have two questions:

    1. How to extract location text (or potential location text).
    2. How to get location (latitude, longitude) by calling a Geocoding service with location text.

    I can help with the second question. (But see edit below for some help with your first question.)

    With the old Google Maps API (which is still working), you could get the geocoding down to one line (one ugly line):

    def geocode(address):
        return tuple([float(s) for s in list(urllib.urlopen('http://maps.google.com/maps/geo?' + urllib.urlencode({'output': 'csv','q': address})))[0].split(',')[2:]])
    

    Check out the Google Maps API Geocoding Documentation:

    Here’s the readable 7 line version plus some wrapper code (when calling from the command line remember to enclose address in quotes):

    import sys
    import urllib
    
    googleGeocodeUrl = 'http://maps.google.com/maps/geo?'
    
    def geocode(address):
        parms = {
            'output': 'csv',
            'q': address}
    
        url = googleGeocodeUrl + urllib.urlencode(parms)
        resp = urllib.urlopen(url)
        resplist = list(resp)
        line = resplist[0]
        status, accuracy, latitude, longitude = line.split(',')
        return latitude, longitude
    
    def main():
        if 1 < len(sys.argv):
            address = sys.argv[1]
        else:
            address = '1600 Amphitheatre Parkway, Mountain View, CA 94043, USA'
    
        coordinates = geocode(address)
        print coordinates
    
    if __name__ ==  '__main__':
        main()
    

    It's simple to parse the CSV format, but the XML format has better error reporting.

    Edit - Help with your first question

    I looked in to nltk. It's not trivial, but I can recommend Natural Language Toolkit Documentation, CH 7 - Extracting Information from Text, specifically, 7.5 Named Entity Recognition. At the end of the section, they point out:

    NLTK provides a classifier that has already been trained to recognize named entities, accessed with the function nltk.ne_chunk(). If we set the parameter binary=True , then named entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE.

    You're specifying True, but you probably want the category labels, so:

    chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)
    

    This provides category labels (named entity type), which seemed promising. But after trying this on your text and a few simple phrases with location, it's clear more rules are needed. Read the documentation for more info.

    这篇关于在python中使用bing或google API获取位置坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆