如何在Python 3.1中的字符串中取消转义HTML实体? [英] How do I unescape HTML entities in a string in Python 3.1?

查看:495
本文介绍了如何在Python 3.1中的字符串中取消转义HTML实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看了四周,只找到解决方案python 2.6和更早,没有如何做到这一点在python 3.X. (我只能访问Win7框。)

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.)

我有能力在3.1中做到这一点,最好没有外部库。目前,我有httplib2安装和访问命令提示curl(这是我得到的页面的源代码)。不幸的是,curl不解码html实体,据我所知,我找不到一个命令来解码它在文档中。

I HAVE to be able to do this in 3.1 and preferably without external libraries. Currently, I have httplib2 installed and access to command-prompt curl (that's how I'm getting the source code for pages). Unfortunately, curl does not decode html entities, as far as I know, I couldn't find a command to decode it in the documentation.

是的,我试过得到美丽的汤工作,多个时间没有成功3.X.如果你可以提供EXPLICIT的说明,如何使它在Python Windows环境中的python 3中工作,我将非常感激。

YES, I've tried to get Beautiful Soup to work, MANY TIMES without success in 3.X. If you could provide EXPLICIT instructions on how to get it to work in python 3 in MS Windows environment, I would be very grateful.

因此,为了清楚,我需要打开这样的字符串: Suzy& amp;

So, to be clear, I need to turn strings like this: Suzy & John into a string like this: "Suzy & John".

推荐答案

您可以使用函数 html.unescape

Python3.4 + (感谢JF Sebastian的更新):

In Python3.4+ (thanks to J.F. Sebastian for the update):

import html
html.unescape('Suzy & John')
# 'Suzy & John'

html.unescape('"')
# '"'

Python3.3 或更旧版本中:

import html.parser    
html.parser.HTMLParser().unescape('Suzy & John')

Python2 中:

import HTMLParser
HTMLParser.HTMLParser().unescape('Suzy & John')

这篇关于如何在Python 3.1中的字符串中取消转义HTML实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆