使用非英文字符的Google http://maps.google.com/maps/geo查询 [英] Google http://maps.google.com/maps/geo query with non-english characters

查看:218
本文介绍了使用非英文字符的Google http://maps.google.com/maps/geo查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个Python语言(使用 urllib2 )解析其中包含非英文字符的地址。我们的目标是找到每个地址的坐标。



当我在 Firefox 中打开此网址时:

  http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv 

将它转换(地址栏中的更改)为

  http://maps.google.com/maps/geo?q=Czech Republic 10000Malešice& output = csv 

和退货

  200,6,50.0865113,14.4918052 

code>

这是正确的结果。

然而,如果我在 urllib2 (或Opera浏览器)中打开相同的URL(用%20等编码),结果为

  200,4,49.7715220,13.2955410 

不正确。如何打开 urllib2 中的第一个网址以获得 200,6,50.0865113,14.4918052 结果?

编辑:

使用的代码

 导入urllib2 

psc ='10000'
name ='Malešice'
url ='http://地图.google.com / maps / geo?q =%s& output = csv'%urllib2.quote('Czech Republic%s%s'%(psc,name))

response = urllib2。 urlopen(url)
data = response.read()

print'Parsed url%s,result%s\\\
'%(url,data)

输出

解析网址http:// maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv,结果200,4,47,72,220,13,2955410

$ b $我可以重现这种行为,起初我对它为何发生了傻眼。使用 wireshark 进一步检查HTTP请求表明,Firefox发送的请求(不奇怪)包含更多的HTTP - 头部。



最终结果是 Accept-Language 标头这是有所不同的。如果


  • Accept-Language 标头设置为

  • 它首先列出非英语语言(优先级似乎不重要)



所以,例如这个 Accept-Language 头文件:

<$ p $
$> $ $ $ $ $ $ $ $ $ $总之,像这样修改你的代码适合我:

 # -  *  -  coding:utf-8  - *  -  
导入urllib2

psc ='10000'
name ='Malešice'
url ='http://maps.google.com/maps/geo ?q =%s& output = csv'%urllib2.quote('Czech Republic%s%s'%(psc,name))
headers = {'Accept-Language':'de-ch,en' }

req = urllib2.Request(url,None,headers)
response = urllib2.urlopen(req)
data = response.read()

print'Parsed url%s,result%s\\\
'%(url,data)

注意:在我看来,这是Google地理编码API中的一个错误。 Accept-Language 标题指示用户代理偏好哪些语言的内容,但它不应该影响请求被解释的方式。


I'm creating a Python (using urllib2) parser of addresses with non-english characters in it. The goal is to find coordinates of every address.

When I open this url in Firefox:

http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv

it is converted (changes in address box) to

http://maps.google.com/maps/geo?q=Czech Republic 10000 Malešice&output=csv

and returns

200,6,50.0865113,14.4918052

which is a correct result.

However, if I open the same url (encoded, with %20 and such) in urllib2 (or Opera browser), the result is

200,4,49.7715220,13.2955410

which is incorrect. How can I open the first url in urllib2 to get the "200,6,50.0865113,14.4918052" result?

Edit:

Code used

import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))

response = urllib2.urlopen(url)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

output

Parsed url http://maps.google.com/maps/geo?q=Czech%20Republic%2010000%20Male%C5%A1ice&output=csv, result 200,4,49.7715220,13.2955410

解决方案

I can reproduce this behavior, and at first I was dumbfounded as to why it's happening. Closer inspection of the HTTP requests with wireshark showed that the requests sent by Firefox (not surprisingly) contain a couple more HTTP-Headers.

In the end it turned out it's the Accept-Language header that makes the difference. You only get the correct result if

  • an Accept-Language header is set
  • and it has a non-english language listed first (the priorities don't seem to matter)

So, for example this Accept-Language header works:

headers = {'Accept-Language': 'de-ch,en'}

To summarize, modified like this your code works for me:

# -*- coding: utf-8 -*-
import urllib2

psc = '10000'
name = 'Malešice'
url = 'http://maps.google.com/maps/geo?q=%s&output=csv' % urllib2.quote('Czech Republic %s %s' % (psc, name))
headers = {'Accept-Language': 'de-ch,en'}

req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req)
data = response.read()

print 'Parsed url %s, result %s\n' % (url, data)

Note: In my opinion, this is a bug in Google's geocoding API. The Accept-Language header indicates what languages the user agent prefers the content in, but it shouldn't have any effect on how the request is interpreted.

这篇关于使用非英文字符的Google http://maps.google.com/maps/geo查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆