将html页中的\ n替换为python LXML中的空格 [英] Replace `\n` in html page with space in python LXML
本文介绍了将html页中的\ n替换为python LXML中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个不清楚的xml,并使用python lxml模块对其进行了处理.我想在进行任何处理之前用space
替换内容中的所有\n
,我该如何处理所有元素的文本.
I have an unclear xml and process it with python lxml module. I want replace all \n
in content with space
before any processing, how can I do this work for text of all elements.
修改 我的xml示例:
<root>
<a> dsdfs\n dsf\n sdf\n</a>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
....
....
....
....
</root>
当我打印ittertext时,我不会在输出中得到它:
and i wan't to get this in output when i print ittertext:
root = #get root element
for i in root.ittertext():
print i
dsdfs dsf sdf
dsdfs dsf sdf
sdf nsdf sdf
推荐答案
下面的代码将xml解析为字符串,然后将\n
替换为space
,然后写入新的xml文件.您可以在这之间进行其他处理,具体取决于您要执行的操作.
Below code will parse the xml into a string, then replace \n
with space
and then write to a new xml file. You can do other processing in between, depending what exactly you want to do.
from lxml import etree
tree = etree.parse('some.xml')
root = tree.getroot()
# Get the whole XML content as string
xml_in_str = etree.tostring(root)
# Replace all \n with space
new_xml_data = xml_in_str.replace(r'\n', ' ')
# Do the processing with the new_xml_data string which is formatted
# Maybe also write to a new XML file, without the \n
with open('newxml.xml', 'w') as f:
f.write(new_xml_data)
some.xml
看起来像:
<root>
<a> dsdfs\n dsf\n sdf\n</a>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
</root>
newxml.xml
看起来像:
<root>
<a> dsdfs dsf sdf </a>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
</root>
这篇关于将html页中的\ n替换为python LXML中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文