如何使用groovy从XML文件中提取HTML代码 [英] How to extract HTML Code from a XML File using groovy

查看:159
本文介绍了如何使用groovy从XML文件中提取HTML代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个XML文件需要从mono元素中提取HTML代码,但我需要html标签。



mono元素中的所有div都是包含div的HTML标记



<请提前致谢。

 < dataset> 
<章节>
< title>免疫学< / title>
< monos>
< mono id =382727>

< div>
< h1>等等等等< / h1>
< / div>
< div>
< p>等等等等< / p>
< / div>

< / mono>
< / monos>
< / chapter>
< title> Inmuno血液学< / title>
< monos>
< mono id =blah blah>
< div>
< h1>等等等等< / h1>
< / div>
< div>
< div class =class1>等等等等< / div>
< / div>
< / mono>
< / monos>
< / chapter>
< / chapter>
< / dataset>

我试过了:

  import javax.xml.parsers。*; 

xml = new XmlParser()。parse(languages.xml)

println(There ++ xml.chapters.chapter.size()+章节)

for(int i = 0; i< xml.chapters.chapter.size(); i ++){

def chapter = xml.chapters.chapter [ i]
def chapterName = chapter.'@name'
println chapterName

println(---- Monos List ---- \\\
\\\



for(int j = 0; j
def mono = chapter.monos。 mono [j]
println(Mono Content:+ mono.toString());
}

println(----单音列表---- \\\
\\\


}

但我只是得到以下输出:

有2章
免疫学
----单子列表-----

单子内容:mono [attributes = {id = 382727};值= [DIV [属性= {};值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {};值= [P [属性= {};值= [等等]]]]]]
----结束单子列表----



血液学
---- Monos List ----



单声道内容:mono [attributes = {id = blah blah};值= [DIV [属性= {};值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {};值= [DIV [属性= {类= Class1的}; value = [blah blah]]]]]]
---- End Monos List ----

解决方案

 import groovy.xml。* 

def src =
< dataset>
<章节>
免疫学< / title>
< monos>
< mono id =382727>

< div>
< h1> blah blah< / h1>
< / div>
< div>
< p> ; blah blah< / p>
< / div>

< / mono>
< / monos>
< / chapter>
< chapter id =701name =hematology>
< title> Inmuno Hematology< / title>
< monos>
< mono id =blah blah >
< ; DIV>
< h1>等等等等< / h1>
< / div>
< div>
< div class =class1>等等等等< / div>
< / div>
< / mono>
< / monos>
< / chapter>
< / chapter>
< / dataset>


def parsed = new XmlSlurper()。parseText(src)

解析。'**'。findAll {it.name()= ='mono'}。each {mono->
mono.children()。each {htmlElement->
println new StreamingMarkupBuilder()。bind {out<< htmlElement} .toString( )
}
}


I have this XML file I need to extract the HTML Code from "mono" element but I need the html tags. I need to use groovy programming language.

All the divs inside "mono" element are HTML Tags including the divs

thank you in advance.

<dataset>
    <chapters>
        <chapter id="700" name="Immunology">
            <title>Immunology</title>   
            <monos>
                <mono id="382727">

                    <div>
                        <h1>blah blah</h1>
                    </div>
                    <div>
                        <p>blah blah</p>
                    </div>

            </mono>
            </monos>
        </chapter>  
        <chapter id="701" name="hematology">
            <title>Inmuno Hematology</title>    
            <monos>
                <mono id="blah blah">
                    <div>
                        <h1>blah blah</h1>
                    </div>
                    <div>
                        <div class="class1">blah blah</div>
                    </div>
                </mono>
            </monos>
        </chapter>
    </chapters>
</dataset>

I have tried :

import javax.xml.parsers.*;

xml = new XmlParser().parse("languages.xml")

println("There are " +xml.chapters.chapter.size() +" Chapters")

for (int i = 0; i < xml.chapters.chapter.size(); i++) {

            def chapter = xml.chapters.chapter[i]
            def chapterName = chapter.'@name'
            println chapterName

            println("----  Monos List ----\n\n")


            for (int j = 0; j < chapter.monos.mono.size(); j++) {

                        def mono = chapter.monos.mono[j]
                        println("Mono Content: " + mono.toString());
            }

           println("---- End Monos List ----\n\n")

}

But I just get the following ouput:

There are 2 Chapters Immunology ---- Monos List ----

Mono Content: mono[attributes={id=382727}; value=[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[p[attributes={}; value=[blah blah]]]]]] ---- End Monos List ----

hematology ---- Monos List ----

Mono Content: mono[attributes={id=blah blah}; value=[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[div[attributes={class=class1}; value=[blah blah]]]]]] ---- End Monos List ----

解决方案

import groovy.xml.*

def src="""
<dataset>
    <chapters>
        <chapter id="700" name="Immunology">
            <title>Immunology</title>   
            <monos>
                <mono id="382727">

                    <div>
                        <h1>blah blah</h1>
                    </div>
                    <div>
                        <p>blah blah</p>
                    </div>

            </mono>
            </monos>
        </chapter>  
        <chapter id="701" name="hematology">
            <title>Inmuno Hematology</title>    
            <monos>
                <mono id="blah blah">
                    <div>
                        <h1>blah blah</h1>
                    </div>
                    <div>
                        <div class="class1">blah blah</div>
                    </div>
                </mono>
            </monos>
        </chapter>
    </chapters>
</dataset>
"""

def parsed=new XmlSlurper().parseText(src)

parsed.'**'.findAll{it.name()=='mono'}.each{mono->
    mono.children().each {htmlElement->
        println new StreamingMarkupBuilder().bind{out << htmlElement}.toString()
    }
}

这篇关于如何使用groovy从XML文件中提取HTML代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆