Jsoup仅过滤掉一些从html到文本的标签 [英] Jsoup filter out only some tags from html to text

查看:416
本文介绍了Jsoup仅过滤掉一些从html到文本的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

jsoup的高手可以告诉我一些将html过滤为文本/字符串的建议吗?我试过调用Document的text().但是所有标签/元素都会被过滤.我的目的是过滤一些指定的标签.

can any master of jsoup tell me some suggestions to filter html to text/string? I've tried calling text() of Document. But all tags/elements will be filtered. My aim is to filter some specified tags.

即:我有类似html的文本:

i.e: I've html text like:

<div>hello<p>world</div>,<table><tr><td>xxx</td></tr>

获得结果:

<div>hello<p>world</div>,xxx 

已过滤标签.

推荐答案

我现在无法测试,但是我想您想编写一个递归函数,该函数逐步遍历树并根据条件打印每个节点.以下是其外观的示例,但我希望您必须对其进行修改以更精确地满足您的需求.

I can't test this right now but I think you want to write a recursive function that steps through the tree and prints each node based on a condition. The following is an example of what it might look like but I expect that you will have to modify it to suit your needs more precisely.

Document doc = JSoup.parse(page_text);
recursive_print(doc.head());
recursive_print(doc.body());

...

private static Set<String> ignore = new HashSet<String>(){{
  add("table");
  ...
}};
public static void recursive_print(Element el){
   if(!ignore.contains(el.className()))
     System.out.println(el.html());
   for(Element child : el.children())
     recursive_print(child);
}

这篇关于Jsoup仅过滤掉一些从html到文本的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆