解析HTML内容时防止etree解析HTML实体 [英] Preventing etree from resolving HTML entities when parsing HTML contents

查看：95 发布时间：2021/5/3 20:56:20 python lxml elementtree

本文介绍了解析HTML内容时防止etree解析HTML实体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在解析HTML内容时，有什么方法可以防止etree解析HTML实体吗?

Is there any way to prevent etree from resolving HTML entities when parsing HTML contents?

html = etree.HTML('<html><body>&amp;</body></html>')
html.find('.//body').text

这给了我'&'但我想得到'& amp;'本身.

This gives me '&' but I want to get '&' itself.

推荐答案

您始终可以对数据进行预处理.替换&"用u'\ xfe'填充到HTML解析器之前，并用'&'替换u'\ xfe'输出时.

You can always pre/post process your data. replace '&' with u'\xfe' before feeding to HTML parser and replace u'\xfe' with '&' when output.

from lxml import etree
html = etree.HTML('<html><body>&amp;</body></html>'.replace('&',u'\xfe'))
html.find('.//body').text.replace(u'\xfe','&')
u'&amp;'

这篇关于解析HTML内容时防止etree解析HTML实体的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析HTML内容时防止etree解析HTML实体 [英] Preventing etree from resolving HTML entities when parsing HTML contents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解析HTML内容时防止etree解析HTML实体 [英] Preventing etree from resolving HTML entities when parsing HTML contents

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭