如何使用groovy从XML文件中提取HTML代码 [英] How to extract HTML Code from a XML File using groovy
问题描述
我有这个XML文件需要从mono元素中提取HTML代码,但我需要html标签。
mono元素中的所有div都是包含div的HTML标记
<请提前致谢。
< dataset>
<章节>
< title>免疫学< / title>
< monos>
< mono id =382727>
< div>
< h1>等等等等< / h1>
< / div>
< div>
< p>等等等等< / p>
< / div>
< / mono>
< / monos>
< / chapter>
< title> Inmuno血液学< / title>
< monos>
< mono id =blah blah>
< div>
< h1>等等等等< / h1>
< / div>
< div>
< div class =class1>等等等等< / div>
< / div>
< / mono>
< / monos>
< / chapter>
< / chapter>
< / dataset>
我试过了:
import javax.xml.parsers。*;
xml = new XmlParser()。parse(languages.xml)
println(There ++ xml.chapters.chapter.size()+章节)
for(int i = 0; i< xml.chapters.chapter.size(); i ++){
def chapter = xml.chapters.chapter [ i]
def chapterName = chapter.'@name'
println chapterName
println(---- Monos List ---- \\\
\\\
)
for(int j = 0; j
def mono = chapter.monos。 mono [j]
println(Mono Content:+ mono.toString());
}
println(----单音列表---- \\\
\\\
)
}
但我只是得到以下输出:
有2章
免疫学
----单子列表-----
单子内容:mono [attributes = {id = 382727};值= [DIV [属性= {};值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {};值= [P [属性= {};值= [等等]]]]]]
----结束单子列表----
血液学
---- Monos List ----
单声道内容:mono [attributes = {id = blah blah};值= [DIV [属性= {};值= [H1 [属性= {}; value = [blah blah]]]],div [attributes = {};值= [DIV [属性= {类= Class1的}; value = [blah blah]]]]]]
---- End Monos List ----
import groovy.xml。*
def src =
< dataset>
<章节>
免疫学< / title>
< monos>
< mono id =382727>
< div>
< h1> blah blah< / h1>
< / div>
< div>
< p> ; blah blah< / p>
< / div>
< / mono>
< / monos>
< / chapter>
< chapter id =701name =hematology>
< title> Inmuno Hematology< / title>
< monos>
< mono id =blah blah >
< ; DIV>
< h1>等等等等< / h1>
< / div>
< div>
< div class =class1>等等等等< / div>
< / div>
< / mono>
< / monos>
< / chapter>
< / chapter>
< / dataset>
def parsed = new XmlSlurper()。parseText(src)
解析。'**'。findAll {it.name()= ='mono'}。each {mono->
mono.children()。each {htmlElement->
println new StreamingMarkupBuilder()。bind {out<< htmlElement} .toString( )
}
}
I have this XML file I need to extract the HTML Code from "mono" element but I need the html tags. I need to use groovy programming language.
All the divs inside "mono" element are HTML Tags including the divs
thank you in advance.
<dataset>
<chapters>
<chapter id="700" name="Immunology">
<title>Immunology</title>
<monos>
<mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
</monos>
</chapter>
<chapter id="701" name="hematology">
<title>Inmuno Hematology</title>
<monos>
<mono id="blah blah">
<div>
<h1>blah blah</h1>
</div>
<div>
<div class="class1">blah blah</div>
</div>
</mono>
</monos>
</chapter>
</chapters>
</dataset>
I have tried :
import javax.xml.parsers.*;
xml = new XmlParser().parse("languages.xml")
println("There are " +xml.chapters.chapter.size() +" Chapters")
for (int i = 0; i < xml.chapters.chapter.size(); i++) {
def chapter = xml.chapters.chapter[i]
def chapterName = chapter.'@name'
println chapterName
println("---- Monos List ----\n\n")
for (int j = 0; j < chapter.monos.mono.size(); j++) {
def mono = chapter.monos.mono[j]
println("Mono Content: " + mono.toString());
}
println("---- End Monos List ----\n\n")
}
But I just get the following ouput:
There are 2 Chapters Immunology ---- Monos List ----
Mono Content: mono[attributes={id=382727}; value=[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[p[attributes={}; value=[blah blah]]]]]] ---- End Monos List ----
hematology ---- Monos List ----
Mono Content: mono[attributes={id=blah blah}; value=[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[div[attributes={class=class1}; value=[blah blah]]]]]] ---- End Monos List ----
import groovy.xml.*
def src="""
<dataset>
<chapters>
<chapter id="700" name="Immunology">
<title>Immunology</title>
<monos>
<mono id="382727">
<div>
<h1>blah blah</h1>
</div>
<div>
<p>blah blah</p>
</div>
</mono>
</monos>
</chapter>
<chapter id="701" name="hematology">
<title>Inmuno Hematology</title>
<monos>
<mono id="blah blah">
<div>
<h1>blah blah</h1>
</div>
<div>
<div class="class1">blah blah</div>
</div>
</mono>
</monos>
</chapter>
</chapters>
</dataset>
"""
def parsed=new XmlSlurper().parseText(src)
parsed.'**'.findAll{it.name()=='mono'}.each{mono->
mono.children().each {htmlElement->
println new StreamingMarkupBuilder().bind{out << htmlElement}.toString()
}
}
这篇关于如何使用groovy从XML文件中提取HTML代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!