URL 组件 % 和 \x [英] URL component % and \x
问题描述
我有疑问.
st = "b%C3%BCrokommunikation"
urllib2.unquote(st)
输出:'b\xc3\xbcrokommunikation'但是,如果我打印它:
OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it:
print urllib2.unquote(st)
输出:bürokommunikation
OUTPUT: bürokommunikation
为什么会有不同?我必须将 bürokommunikation 而不是 'b\xc3\xbcrokommunikation' 写入文件.
Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file.
我的问题是:我有很多从 URL 中提取的具有此类值的数据.我必须将它们存储为例如.bürokommunikation 转换为文本文件.
My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file.
推荐答案
当您打印
字符串时,您的终端模拟器识别Unicode字符\xc3\xbc
并显示它正确.
When you print
the string, your terminal emulator recognizes the unicode character \xc3\xbc
and displays it correctly.
然而,正如@MarkDickinson 在评论中所说,ü
在 ASCII 中不存在,所以你需要告诉 Python 你想写入文件的字符串是 unicode 编码的,以及您要使用的编码格式,例如 UTF-8.
However, as @MarkDickinson says in the comments, ü
doesn't exist in ASCII, so you'll need to tell Python that the string you want to write to a file is unicode encoded, and what encoding format you want to use, for instance UTF-8.
使用 codecs
库非常容易:
import codecs
# First create a Python UTF-8 string
st = "b%C3%BCrokommunikation"
encoded_string = urllib2.unquote(st).decode('utf-8')
# Write it to file keeping the encoding
with codecs.open('my_file.txt', 'w', 'utf-8') as f:
f.write(encoded_string)
这篇关于URL 组件 % 和 \x的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!