将html页中的\ n替换为python LXML中的空格 [英] Replace `\n` in html page with space in python LXML

查看:161
本文介绍了将html页中的\ n替换为python LXML中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不清楚的xml,并使用python lxml模块对其进行了处理.我想在进行任何处理之前用space替换内容中的所有\n,我该如何处理所有元素的文本.

I have an unclear xml and process it with python lxml module. I want replace all \n in content with space before any processing, how can I do this work for text of all elements.

修改 我的xml示例:

<root>
    <a> dsdfs\n dsf\n sdf\n</a>
    <bds> 
        <d>sdf\n\n\n\n\n\n</d>
        <d>sdf\n\n\nsdf\nsdf\n\n</d>
    </bds>
    ....
    ....
    ....
    ....
</root>

当我打印ittertext时,我不会在输出中得到它:

and i wan't to get this in output when i print ittertext:

root = #get root element
for i in root.ittertext():
   print i

dsdfs  dsf  sdf
dsdfs  dsf  sdf
sdf  nsdf sdf  

推荐答案

下面的代码将xml解析为字符串,然后将\n替换为space,然后写入新的xml文件.您可以在这之间进行其他处理,具体取决于您要执行的操作.

Below code will parse the xml into a string, then replace \n with space and then write to a new xml file. You can do other processing in between, depending what exactly you want to do.

from lxml import etree 
tree = etree.parse('some.xml') 
root = tree.getroot()
# Get the whole XML content as  string
xml_in_str = etree.tostring(root)

# Replace all \n with space
new_xml_data = xml_in_str.replace(r'\n', ' ')

# Do the processing with the new_xml_data string which is formatted

# Maybe also write to a new XML file, without the \n
with open('newxml.xml', 'w') as f:
    f.write(new_xml_data)

some.xml看起来像:

<root>
    <a> dsdfs\n dsf\n sdf\n</a>
    <bds> 
        <d>sdf\n\n\n\n\n\n</d>
        <d>sdf\n\n\nsdf\nsdf\n\n</d>
    </bds>
    <bds> 
        <d>sdf\n\n\n\n\n\n</d>
        <d>sdf\n\n\nsdf\nsdf\n\n</d>
    </bds>
    <bds> 
        <d>sdf\n\n\n\n\n\n</d>
        <d>sdf\n\n\nsdf\nsdf\n\n</d>
    </bds>
</root>

newxml.xml看起来像:

<root>
    <a> dsdfs  dsf  sdf </a>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
    <bds> 
        <d>sdf      </d>
        <d>sdf   sdf sdf  </d>
    </bds>
</root>

这篇关于将html页中的\ n替换为python LXML中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆