使用jsoup解析xml(同时避免使用< p>标签) [英] parsing xml with jsoup (while avoiding <p> tags)
本文介绍了使用jsoup解析xml(同时避免使用< p>标签)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
This question is very similar in nature to this one, but for java instead of python.
<body.content>
<block class="lead_paragraph">
<p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>
</block>
<block class="full_text">
<p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>
</block>
我想做的是使用jsoup提取句子的文本,而不使用所有xml格式.
What I'm trying to do is extract the text of the sentence without all the xml formatting, using jsoup.
所以我正在寻找
LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.
更新
实际上,我的情况有所不同,因为我还有一些其他的XML格式要保留,即<PERSON>
In fact my situation is a bit different though, because I've got some additional XML formatting which I'd like to keep, i.e. <PERSON>
<block class="full_text">
<p>SCHEINMAN</PERSON>--<PERSON>Alan</PERSON>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love, <PERSON>Roni</PERSON>, <PERSON>Sandy</PERSON>, <PERSON>Jarret</PERSON>, <PERSON>Greg</PERSON>, <PERSON>Kate</PERSON>, and <PERSON>Auden Gray</PERSON></p>
</block></body.content></body></nitf>
理想的输出为:
SCHEINMAN</PERSON>--<PERSON>Alan</PERSON>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love, <PERSON>Roni</PERSON>, <PERSON>Sandy</PERSON>, <PERSON>Jarret</PERSON>, <PERSON>Greg</PERSON>, <PERSON>Kate</PERSON>, and <PERSON>Auden Gray</PERSON>
到目前为止,我的尝试:
My attempt so far:
BufferedReader br = new BufferedReader(new FileReader(filename));
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
Document doc = Jsoup.parse(everything);
String link = doc.select("block.full_text").text();
System.out.println(link);
}
finally
{
br.close();
}
推荐答案
您可以在 jsoup .
String html = "<body.content>\n"
+ " <block class=\"lead_paragraph\">\n"
+ " <p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>\n"
+ " </block>\n"
+ " <block class=\"full_text\">\n"
+ " <p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>\n"
+ " </block>";
Document doc = Jsoup.parse(html);
String link = doc.select("block.full_text").text();
System.out.println(link);
输出:
LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.
更新:
String html = "<block class=\"full_text\">\n"
+ " <p>SCHEINMAN</PERSON>--<PERSON>Alan</PERSON>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love, <PERSON>Roni</PERSON>, <PERSON>Sandy</PERSON>, <PERSON>Jarret</PERSON>, <PERSON>Greg</PERSON>, <PERSON>Kate</PERSON>, and <PERSON>Auden Gray</PERSON></p></block></body.content></body></nitf>";
Document doc = Jsoup.parse(html);
String link = doc.select("block.full_text").html();
System.out.println(link);
输出:
<p>SCHEINMAN--
<person>
Alan
</person>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love,
<person>
Roni
</person>,
<person>
Sandy
</person>,
<person>
Jarret
</person>,
<person>
Greg
</person>,
<person>
Kate
</person>, and
<person>
Auden Gray
</person></p>
这篇关于使用jsoup解析xml(同时避免使用< p>标签)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文