lxml:强制将换行符转换为实体 [英] lxml: Force to convert newlines to entities

查看:136
本文介绍了lxml:强制将换行符转换为实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将文本元素内的换行符输出为
实体? 当前,换行符按原样插入到输出中:

Is there a way to output newlines inside text elements as 
 entities? Currently, newlines are inserted into output as-is:

from lxml import etree
from lxml.builder import E
etree.tostring(E.a('one\ntwo'), pretty_print=True)
b'<a>one\ntwo</a>\n'

所需的输出:

b'<a>one&#13;two</a>\n'

推荐答案

浏览

After looking through the lxml docs, it looks like there is no way to force certain characters to be printed as escaped entities. It also looks like the list of characters that gets escaped varies by the output encoding.

话虽如此,我会使用 BeautifulSoup的<在lxml顶部的c1> 来完成工作:

With all of that said, I'd use BeautifulSoup's prettify() on top of lxml to get the job done:

from bs4 import BeautifulSoup as Soup
from xml.sax.saxutils import escape

def extra_entities(s):
    return escape(s).replace('\n', '&#13;')

soup = Soup("<a>one\ntwo</a>", 'lxml-xml')
print(soup.prettify(formatter=extra_entities))

输出:

<?xml version="1.0" encoding="utf-8"?>
<a>
 one&#10;two
</a>

请注意,换行符实际上应该映射到&#10;(&#13;用于回车或\r),但是我不会争论,因为我无法在本地测试FCPXML格式.

Note that newlines should actually map to &#10; (&#13; is for carriage returns or \r) but I won't argue because I can't test FCPXML format locally.

这篇关于lxml:强制将换行符转换为实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆