ElementTree不会使用Python 2.7解析特殊字符 [英] ElementTree will not parse special characters with Python 2.7

查看:139
本文介绍了ElementTree不会使用Python 2.7解析特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不得不将python脚本从python 3重写为python2,之后我在使用ElementTree解析特殊字符时遇到问题。

I had to rewrite my python script from python 3 to python2 and after that I got problem parsing special characters with ElementTree.

这是我的xml的一部分:

This is a piece of my xml:

<account number="89890000" type="Kostnad" taxCode="597" vatCode="">Avsättning egenavgifter</account>

这是我解析此行时的输出:

This is the ouput when I parse this row:

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avs\xc3\xa4ttning egenavgifter')

因此,字符ä似乎存在问题。

So it seems to be a problem with the character "ä".

这是我在代码中的做法:

This is how i do it in the code:

sys.setdefaultencoding( "UTF-8" )
xmltree = ET()

xmltree.parse("xxxx.xml")

printAccountPlan(xmltree)

def printAccountPlan(xmltree):
    print("account:",str(i.attrib['number']),      "AccountType:",str(i.attrib['type']),"Name:",str(i.text))

任何人都有一个让ElementTree解析字符ä的想法,因此结果将如下所示:

Anyone have an ide to get the ElementTree parse the charracter "ä", so the result will be like this:

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')


推荐答案

您碰到两个不同的地方同时在Python 2和Python 3之间切换s,这就是为什么您得到意想不到的结果的原因。

You're running into two separate differences between Python 2 and Python 3 at the same time, which is why you're getting unexpected results.

第一个区别是您可能已经意识到: Python在第2版中的print语句在第3版中成为打印功能。这种更改在您的情况下造成了一种特殊情况,我将在稍后介绍。但是简单来说,这就是'print'的工作方式的差异:

The first difference is one you're probably already aware of: Python's print statement in version 2 became a print function in version 3. That change is creating a special circumstance in your case, which I'll get to a little later. But briefly, this is the difference in how 'print' works:

在Python 3中:

In Python 3:

>>> # Two arguments 'Hi' and 'there' get passed to the function 'print'.
>>> # They are concatenated with a space separator and printed.
>>> print('Hi', 'there') 
>>> Hi there

在Python 2中:

In Python 2:

>>> # 'print' is a statement which doesn't need parenthesis.
>>> # The parenthesis instead create a tuple containing two elements 
>>> # 'Hi' and 'there'. This tuple is then printed.
>>> print('Hi', 'there')
>>> ('Hi', 'there')

您的第二个问题是元组通过在每个元素上调用repr()。在Python 3中,repr()根据需要显示unicode。但是在Python 2中,repr()对所有超出可打印ASCII范围(例如,大于127)的字节值使用转义字符。这就是为什么看到它们。

The second problem in your case is that tuples print themselves by calling repr() on each of their elements. In Python 3, repr() displays unicode as you want. But in Python 2, repr() uses escape characters for any byte values which fall outside the printable ASCII range (e.g., larger than 127). This is why you're seeing them.

您可以决定是否解决此问题,这取决于您的目标代码。 Python 2中的元组表示形式使用转义字符,因为它并非旨在显示给最终用户。这更多的是为您提供开发人员内部便利,故障排除和类似任务。如果您只是为自己打印,那么您可能不需要更改任何内容,因为Python向您显示了该非ASCII字符的编码字节正确存在于字符串中。如果您确实想向最终用户显示具有元组外观格式的内容,那么一种方法(保留正确的unicode打印)是手动创建格式,例如:

You may decide to resolve this issue, or not, depending on what you're goal is with your code. The representation of a tuple in Python 2 uses escape characters because it's not designed to be displayed to an end-user. It's more for your internal convenience as a developer, for troubleshooting and similar tasks. If you're simply printing it for yourself, then you may not need to change a thing because Python is showing you that the encoded bytes for that non-ASCII character are correctly there in your string. If you do want to display something to the end-user which has the format of how tuples look, then one way to do it (which retains correct printing of unicode) is to manually create the formatting, like this:

def printAccountPlan(xmltree):
    data = (i.attrib['number'], i.attrib['type'], i.text)
    print "('account:', '%s', 'AccountType:', '%s', 'Name:', '%s')" % data
# Produces this:
# ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')

这篇关于ElementTree不会使用Python 2.7解析特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆