使用Jsoup从html文件中提取标签 [英] Extract Tags from a html file using Jsoup

查看：728 发布时间：2018/12/22 19:40:22 java html parsing jsoup

本文介绍了使用Jsoup从html文件中提取标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对网络文档进行结构分析。为此，我需要仅提取Web文档的结构（仅标记）。我找到了一个名为Jsoup的java的html解析器。但我不知道如何使用它来提取标签。

I am doing a structural analysis on web documents. For this i need to extract only the structure of a web document(only the tags). I found a html parser for java called Jsoup. But I don't know how to use it to extract tags.

示例：

<html>
 <head>
    this is head
 </head>
 <body>
    this is body
 </body>
</html>

输出：

html,head,head,body,body,html

推荐答案

听起来像深度优先遍历：

Sound like a depth-first traversal:

public class JsoupDepthFirst {

    private static String htmlTags(Document doc) {
        StringBuilder sb = new StringBuilder();
        htmlTags(doc.children(), sb);
        return sb.toString();
    }

    private static void htmlTags(Elements elements, StringBuilder sb) {
        for(Element el:elements) {
            if(sb.length() > 0){
                sb.append(",");
            }
            sb.append(el.nodeName());
            htmlTags(el.children(), sb);
            sb.append(",").append(el.nodeName());
        }
    }

    public static void main(String... args){
        String s = "<html><head>this is head </head><body>this is body</body></html>";
        Document doc = Jsoup.parse(s);
        System.out.println(htmlTags(doc));
    }
}

另一种解决方案是使用jsoup NodeVisitor，如下所示： / p>

another solution is to use jsoup NodeVisitor as follows:

   SecondSolution ss = new SecondSolution();
   doc.traverse(ss);
   System.out.println(ss.sb.toString());

class：

  public static class SecondSolution implements NodeVisitor {

        StringBuilder sb = new StringBuilder();

        @Override
        public void head(Node node, int depth) {
            if (node instanceof Element && !(node instanceof Document)) {
                if (sb.length() > 0) {
                    sb.append(",");
                }
                sb.append(node.nodeName());
            }
        }

        @Override
        public void tail(Node node, int depth) {
            if (node instanceof Element && !(node instanceof Document)) {
                sb.append(",").append(node.nodeName());
            }
        }
    }

这篇关于使用Jsoup从html文件中提取标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Jsoup从html文件中提取标签 [英] Extract Tags from a html file using Jsoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用Jsoup从html文件中提取标签 [英] Extract Tags from a html file using Jsoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭