如何在Python中搜索和替换utf-8特殊字符? [英] How to search and replace utf-8 special characters in Python?

查看:333
本文介绍了如何在Python中搜索和替换utf-8特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python初学者,并且遇到utf-8问题.

I'm a Python beginner, and I have a utf-8 problem.

我有一个utf-8字符串,我想用ASCII替换替换所有德语变音符号(在德语中,u-umlaut的ü"可能会改写为"ue").

I have a utf-8 string and I would like to replace all german umlauts with ASCII replacements (in German, u-umlaut 'ü' may be rewritten as 'ue').

u-umlaut具有Unicode代码点252,所以我尝试了这一点:

u-umlaut has unicode code point 252, so I tried this:

>>> str = unichr(252) + 'ber'
>>> print repr(str)
u'\xfcber'
>>> print repr(str).replace(unichr(252), 'ue')
u'\xfcber'

我希望最后一个字符串为u'ueber'.

I expected the last string to be u'ueber'.

我最终想要做的是用'ue'替换文件中的所有u-umlauts:

What I ultimately want to do is replace all u-umlauts in a file with 'ue':

import sys
import codecs      
f = codecs.open(sys.argv[1],encoding='utf-8')
for line in f: 
    print repr(line).replace(unichr(252), 'ue')

感谢您的帮助! (我正在使用Python 2.3.)

Thanks for your help! (I'm using Python 2.3.)

推荐答案

我认为以更直接的方式做到这一点最简单明了,直接使用uni表示法os'ü'比unichr(252)更好.

I think it's easiest and clearer to do it on a more straightforward way, using directly the unicode representation os 'ü' better than unichr(252).

>>> s = u'über'
>>> s.replace(u'ü', 'ue')
u'ueber'

无需使用repr,因为这将打印字符串的"Python表示形式",您只需要提供可读的字符串即可.

There's no need to use repr, as this will print the 'Python representation' of the string, you just need to present the readable string.

如果.py文件尚未出现,您还需要在其开头添加以下行,以告知文件编码

You will need also to include the following line at the beggining of the .py file, in case it's not already present, to tell the encoding of the file

#-*- coding: UTF-8 -*-

已添加:当然,声明的编码必须与文件的编码相同.请检查是否可能存在一些问题(例如,我在Windows上使用Eclipse时遇到了问题,因为它默认情况下将文件写入为cp1252.而且它应该与系统的编码相同,可以是utf-8或拉丁文-1或其他.

Added: Of course, the coding declared must be the same as the encoding of the file. Please check that as can be some problems (I had problems with Eclipse on Windows, for example, as it writes by default the files as cp1252. Also it should be the same encoding of the system, which could be utf-8, or latin-1 or others.

此外,请勿将str用作变量的定义,因为它是Python库的一部分.您以后可能会遇到问题.

Also, don't use str as the definition of a variable, as it is part of the Python library. You could have problems later.

(我正在尝试使用python 2.6,我认为在python 2.3中结果是相同的)

(I am trying on Python 2.6, I think in Python 2.3 the result is the same)

这篇关于如何在Python中搜索和替换utf-8特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆