lxml:强制将换行符转换为实体 [英] lxml: Force to convert newlines to entities
问题描述
是否可以将文本元素内的换行符输出为
实体?
当前,换行符按原样插入到输出中:
Is there a way to output newlines inside text elements as
entities?
Currently, newlines are inserted into output as-is:
from lxml import etree
from lxml.builder import E
etree.tostring(E.a('one\ntwo'), pretty_print=True)
b'<a>one\ntwo</a>\n'
所需的输出:
b'<a>one two</a>\n'
推荐答案
After looking through the lxml docs, it looks like there is no way to force certain characters to be printed as escaped entities. It also looks like the list of characters that gets escaped varies by the output encoding.
话虽如此,我会使用 BeautifulSoup的<在lxml
顶部的c1> 来完成工作:
With all of that said, I'd use BeautifulSoup's prettify()
on top of lxml
to get the job done:
from bs4 import BeautifulSoup as Soup
from xml.sax.saxutils import escape
def extra_entities(s):
return escape(s).replace('\n', ' ')
soup = Soup("<a>one\ntwo</a>", 'lxml-xml')
print(soup.prettify(formatter=extra_entities))
输出:
<?xml version="1.0" encoding="utf-8"?>
<a>
one two
</a>
请注意,换行符实际上应该映射到
(
用于回车或\r
),但是我不会争论,因为我无法在本地测试FCPXML格式.
Note that newlines should actually map to
(
is for carriage returns or \r
) but I won't argue because I can't test FCPXML format locally.
这篇关于lxml:强制将换行符转换为实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!