URL 组件 % 和 \x [英] URL component % and \x

查看：18 发布时间：2021/9/15 18:35:47 python urllib2 urllib

本文介绍了URL 组件 % 和 \x的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有疑问.

st = "b%C3%BCrokommunikation"
urllib2.unquote(st)

输出:'b\xc3\xbcrokommunikation'但是，如果我打印它:

OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it:

print urllib2.unquote(st)

输出:bürokommunikation

OUTPUT: bürokommunikation

为什么会有不同?我必须将 bürokommunikation 而不是 'b\xc3\xbcrokommunikation' 写入文件.

Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file.

我的问题是:我有很多从 URL 中提取的具有此类值的数据.我必须将它们存储为例如.bürokommunikation 转换为文本文件.

My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file.

推荐答案

当您打印字符串时，您的终端模拟器识别Unicode字符\xc3\xbc并显示它正确.

When you print the string, your terminal emulator recognizes the unicode character \xc3\xbc and displays it correctly.

然而，正如@MarkDickinson 在评论中所说，ü 在 ASCII 中不存在，所以你需要告诉 Python 你想写入文件的字符串是 unicode 编码的，以及您要使用的编码格式，例如 UTF-8.

However, as @MarkDickinson says in the comments, ü doesn't exist in ASCII, so you'll need to tell Python that the string you want to write to a file is unicode encoded, and what encoding format you want to use, for instance UTF-8.

使用 codecs 库非常容易:

import codecs

# First create a Python UTF-8 string
st = "b%C3%BCrokommunikation"
encoded_string = urllib2.unquote(st).decode('utf-8')

# Write it to file keeping the encoding
with codecs.open('my_file.txt', 'w', 'utf-8') as f:
    f.write(encoded_string)

这篇关于URL 组件 % 和 \x的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

URL 组件 % 和 \x [英] URL component % and \x

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

URL 组件 % 和 \x [英] URL component % and \x

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭