删除AppEngine Python Env中的HTML标签(等同于Ruby的Sanitize) [英] Remove HTML tags in AppEngine Python Env (equivalent to Ruby's Sanitize)

查看:89
本文介绍了删除AppEngine Python Env中的HTML标签(等同于Ruby的Sanitize)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个python模块,它将帮助我摆脱HTML标签,但保留文本值。我之前尝试过BeautifulSoup,但我无法弄清楚如何做这个简单的任务。我尝试搜索可以做到这一点的Python模块,但它们似乎都依赖于其他在AppEngine上不能正常工作的库。

I am looking for a python module that will help me get rid of HTML tags but keep the text values. I tried BeautifulSoup before and I couldn't figure out how to do this simple task. I tried searching for Python modules that could do this but they all seem to be dependent on other libraries which does not work well on AppEngine.

下面是Ruby的示例代码清理库,这就是我在Python后所做的:

Below is a sample code from Ruby's sanitize library and that's what I am after in Python:

require 'rubygems'
require 'sanitize'

html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'

Sanitize.clean(html) # => 'foo'

感谢您的建议。

-e

推荐答案

>>> import BeautifulSoup
>>> html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
>>> bs = BeautifulSoup.BeautifulSoup(html)  
>>> bs.findAll(text=True)
[u'foo']

(Unicode)字符串的列表。如果你想把它变成一个单一的字符串,使用''。join(thatlist)

This gives you a list of (Unicode) strings. If you want to turn it into a single string, use ''.join(thatlist).

这篇关于删除AppEngine Python Env中的HTML标签(等同于Ruby的Sanitize)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆