如何在Python中编码和解码百分比编码(URL编码)的字符串? [英] How can I encode and decode percent-encoded (URL encoded) strings in Python?

查看:383
本文介绍了如何在Python中编码和解码百分比编码(URL编码)的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个简单的应用程序,可以从Wiki页面下载文章。当我搜索名字为 Lech 的代码时,我的代码将返回 Lech_Kaczy%C5%84ski Lech_Pozna%C5%84 而不是Lech_KaczyńskiLech_Poznań

I wrote a simple application which downloads articles from wiki pages. When I search, for example for a firstname Lech, my code returns strings like Lech_Kaczy%C5%84ski or Lech_Pozna%C5%84 instead of Lech_Kaczyński and Lech_Poznań.

如何将那些字符解码为普通的波兰字母?我尝试使用:
urllib.unquote(text),但随后得到了 Lech_Kaczy\xc5\x84ski Lech_Pozna\xc5\x84 代替Lech_KaczyńskiLech_Poznań

How can I decode those characters to ordinary polish letters? I tried to use: urllib.unquote(text) but then got Lech_Kaczy\xc5\x84ski, Lech_Pozna\xc5\x84 instead of Lech_Kaczyński and Lech_Poznań.

我的代码是>

I have in my code:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

但是结果是相同的(根本行不通)。

But the result is the same (it simply does not work).

推荐答案

尝试以下操作:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

这将返回unicode字符串:

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

您可以照常打印和处理。例如:

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

将导致

Lech_Kaczyński

这篇关于如何在Python中编码和解码百分比编码(URL编码)的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆