标签为空或太长 - python urllib2 [英] label empty or too long - python urllib2

查看:37
本文介绍了标签为空或太长 - python urllib2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个奇怪的情况:

我像这样卷曲网址:

def check_urlstatus(url):h = httplib2.Http()尝试:resp = h.request("http://" + url, 'HEAD')如果 int(resp[0]['status']) <400:返回确定"别的:返回坏"除了 httplib2.ServerNotFoundError:返回坏"

如果我尝试使用以下方法进行测试:

if check_urlstatus('.f.de') == "bad": #<--- 这里发生错误#..#..

它是说:

UnicodeError: 标签为空或太长

我在这里造成的问题是什么?

编辑:这里是 idna 的回溯.我猜,它试图通过 . 分割输入,在这种情况下,第一个标签是空的,这是第一个 . 之前的速度.

解决方案

问题是您的 URL 无法按照 IDNA 规则,管理国际化域名的转换方式:

<块引用>

域名的 ASCII 和非 ASCII 形式之间的转换是由称为 ToASCII 和 ToUnicode 的算法完成.这些算法不是应用于整个域名,而是应用于到个别标签.例如,如果域名是www.example.com,则标签为 www、example 和 com.ToASCII 或ToUnicode 分别应用于这三个中的每一个.

这两种算法的细节比较复杂,具体在RFC 3490.下面概述了它们的功能.

ToASCII 保持任何 ASCII 标签不变,但如果标签不适合域名系统.如果给定的标签包含至少一个非 ASCII 字符,ToASCII 将应用 Nameprep算法,将标签转换为小写并执行其他归一化,然后将结果转换为 ASCII 使用Punycode[16] 在四字符字符串xn--"之前.[17]这个四字符的字符串称为 ASCII 兼容编码(ACE) 前缀,用于区分 Punycode 编码标签和普通的 ASCII 标签.ToASCII 算法可能会以多种方式失败;例如,最终的字符串可能会超过 63 个字符的限制DNS 名称.ToASCII 失败的标签不能用于国际化域名.

在您的情况下,''(空白)不是有效的域名字符,您最终会得到:

<预><代码>>>>'.f.de'.encode('idna')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件/usr/lib/python2.6/encodings/idna.py",第164行,编码result.append(ToASCII(label))文件/usr/lib/python2.6/encodings/idna.py",第 73 行,ToASCII引发 UnicodeError("标签为空或太长")UnicodeError: 标签为空或太长

如果您将域名更改为a.f.de",则不应引发此异常.

I am having a strange situation:

i am curling urls like this:

def check_urlstatus(url):
  h = httplib2.Http()
  try:
      resp = h.request("http://" + url, 'HEAD')        
      if int(resp[0]['status']) < 400:
          return 'ok'
      else:
          return 'bad'
  except httplib2.ServerNotFoundError:
      return 'bad'

if I try to test this with:

if check_urlstatus('.f.de') == "bad": #<--- error happening here
   #..
   #..

it is saying:

UnicodeError: label empty or too long

what is the problem i am causing here?

EDIT: here is the traceback with idna. I guess, it tries to split the input by . and in this case, first label is empty which is the pace before the first ..

解决方案

The problem is your URL cannot properly be encoded as per the IDNA rules, which govern how internationalized domain names are converted:

The conversions between ASCII and non-ASCII forms of a domain name are accomplished by algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example, and com. ToASCII or ToUnicode are applied to each of these three separately.

The details of these two algorithms are complex, and are specified in RFC 3490. The following gives an overview of their function.

ToASCII leaves unchanged any ASCII label, but will fail if the label is unsuitable for the Domain Name System. If given a label containing at least one non-ASCII character, ToASCII will apply the Nameprep algorithm, which converts the label to lowercase and performs other normalization, and will then translate the result to ASCII using Punycode[16] before prepending the four-character string "xn--".[17] This four-character string is called the ASCII Compatible Encoding (ACE) prefix, and is used to distinguish Punycode encoded labels from ordinary ASCII labels. The ToASCII algorithm can fail in several ways; for example, the final string could exceed the 63-character limit of a DNS name. A label for which ToASCII fails cannot be used in an internationalized domain name.

In your case a '' (blank) is not a valid domain name character, and you end up with this:

>>> '.f.de'.encode('idna')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/encodings/idna.py", line 164, in encode
    result.append(ToASCII(label))
  File "/usr/lib/python2.6/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

If you change the domain name to 'a.f.de' it should not raise this exception.

这篇关于标签为空或太长 - python urllib2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆