如何让Groovy / XMLSlurper从节点剥离html标签? [英] How keep groovy/XMLSlurper from stripping html tags from a node?

查看:69
本文介绍了如何让Groovy / XMLSlurper从节点剥离html标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从POST响应中读取HTML文件并使用XMLSlurper解析它。这个页面上的textarea节点有一些HTML代码(非urlencoded - 不是我的选择),当我读取该值时,Groovy会去除所有标签。

I'm reading an HTML file from a POST response and parsing it with XMLSlurper. The textarea node on the page has some HTML code put into it (non-urlencoded - not my choice) and when I read that value, Groovy strips all the tags.

示例:

Example:

<html>
    <body>
        <textarea><html><body>This has html code for some reason</body></html></textarea>
    </body>
</html>

当我解析上述内容并找到(...)textarea节点时,它会返回给我:

When I parse the above and then find(...) the "textarea" node, it returns to me:

This has html code for some reason

并且没有标签。如何保留标签?

and none of the tags. How do I keep the tags?

推荐答案

我认为你得到的是正确的数据,但打印出来的错误...您可以尝试使用StreamingMarkupBuilder将节点转换回一片xml吗?

I think you're getting the right data, but printing it out wrong... Can you try using StreamingMarkupBuilder to convert the node back to a piece of xml?

def xml = '''<html>
            |  <body>
            |    <textarea><html><body>This has html code for some reason</body></html></textarea>
            |  </body>
            |</html>'''

def ta = new XmlSlurper().parseText( xml ).body.textarea

String content = new groovy.xml.StreamingMarkupBuilder().bind {
  mkp.yield ta.children()
}

assert content == '<html><body>This has html code for some reason</body></html>'

这篇关于如何让Groovy / XMLSlurper从节点剥离html标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆