删除XML标签及其内容之间的任何内容 [英] Removing anything between XML tags and their content

查看:123
本文介绍了删除XML标签及其内容之间的任何内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要删除XML标记之间的所有内容,尤其是空格和换行符.

I would need to remove anything between XML tags, especially whitespace and newlines.

例如,从以下位置删除空格和新闻行:
</node> \ n< node id ="whatever">

For example removing whitespace and newslines from:
</node> \n<node id="whatever">

获得:
</node>< node id ="whatever">

to get:
</node><node id="whatever">

这不是要手动解析XML ,而是要在通过工具解析XML数据之前准备XML数据.更具体地说,我正在使用Hpricot(Ruby)解析XML,不幸的是,我们当前停留在0.6.164版本上,所以...我不知道更多最新版本,但是这一版本通常会返回怪异的节点(对象)仅包含空格和换行符.因此,其想法是在将XML转换为Hpricot文档之前对其进行清理.赞赏其他解决方案.

This is not meant for parsing XML by hand, but rather to prepare XML data before it's getting parsed by a tool. To be more specific, I'm using Hpricot (Ruby) to parse XML and unfortunately we're currently stuck on version 0.6.164, so ... I don't know about more recent versions, but this one often returns weird nodes (Objects) that only contain whitespace and line breaks. So the idea is to clean up the XML before converting it into an Hpricot document. Alternative solutions appreciated.

一个测试示例:NoMethodError:"\ n"的未定义方法`children':Hpricot :: Text
这里有趣的部分不是NoMethodError,因为这很好,但是Hpricot :: Text元素仅包含换行符,仅此而已.

An example from a test: NoMethodError: undefined method `children' for "\n ":Hpricot::Text
The interesting part here is not the NoMethodError, because that's just fine, but that the Hpricot::Text element only contains a newline and nothing more.

推荐答案

一种解决方案是选择所有空白"文本节点并将其删除.

A solution is to select all "blank" text nodes and remove them.

doc = Nokogiri(xml_source)
doc.xpath('//text()[not(normalize-space())]').remove

这篇关于删除XML标签及其内容之间的任何内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆