如何从< div>中提取内容标签java [英] how to extract content from <div> tag java

查看:80
本文介绍了如何从< div>中提取内容标签java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个严重的问题。
i希望从标记中提取内容,例如:

 < div class =main-content> ; 
< div class =sub-content>子内容在这里< / div>
主要内容< / div>

我期望的输出是:


这里的子内容
主要内容在这里


我试过使用正则表达式,但结果是那么令人印象深刻。
通过使用:

  Pattern.compile(< div>(\\S +)< DIV>中); 

会在第一个< * / div>标记之前返回所有字符串,任何人都可以帮我吗?

解决方案

我建议避免使用正则表达式来解析HTML。您可以使用 Jsoup 轻松完成您的要求:

  public static void main(String [] args){
String html =< html>< head />< body>< div class = \main-content \ > 中+
< div class = \sub-content \>子内容在这里< / div> +
主要内容< / div>< / body>< / html>;
Document document = Jsoup.parse(html);
Elements divs = document.select(div);
for(元div:divs){
System.out.println(div.ownText());


$ / code $ / pre

$ hr

在回应评论时:如果你想把 div 元素的内容放到 String 的数组中,你可以简单地做:

  String [] divsTexts = new String [divs.size()]; 
for(int i = 0; i< divs.size(); i ++){
divsTexts [i] = divs.get(i).ownText();

b



$ b

回应评论:如果你有嵌套元素,并且你想为每个元素获取自己的文本,那么你可以使用jQuery多重选择器语法。下面是一个例子:

  public static void main(String [] args){
String html =< html> ;< head />< body>< div class = \main-content \> +
< div class = \sub-content \> +
< p>段落< b>带有粗体文字< / b>< / p> +
此处的子内容< / div> +
主要内容< / div>< / body>< / html>;
Document document = Jsoup.parse(html);
Elements divs = document.select(div,p,b);
for(元div:divs){
System.out.println(div.ownText());


上面的代码将解析以下HTML: p>

 < html> 
< head />
< body>
< div class =main-content>
< div class =sub-content>
< p>段落< b>加上一些粗体文字< / b>< / p>
此处的子内容< / div>
主要内容< / div>
< / body>
< / html>

并输出以下内容:

<$ p








i have a serious problem. i would like to extract the content from tag such as:

<div class="main-content">
    <div class="sub-content">Sub content here</div>
      Main content here </div>

output i would expect is:

Sub content here
Main content here

i've tried using regex, but the result isn't so impressive. By using:

Pattern.compile("<div>(\\S+)</div>");

would return all the strings before the first <*/div> tag
so, could anyone help me pls?

解决方案

I'd recommend avoiding regex for parsing HTML. You can easily do what you ask by using Jsoup:

public static void main(String[] args) {
    String html = "<html><head/><body><div class=\"main-content\">" +
            "<div class=\"sub-content\">Sub content here</div>" +
            "Main content here </div></body></html>";
    Document document = Jsoup.parse(html);
    Elements divs = document.select("div");
    for (Element div : divs) {
        System.out.println(div.ownText());
    }
}


In response to comment: if you want to put the content of the div elements into an array of Strings you can simply do:

    String[] divsTexts = new String[divs.size()];
    for (int i = 0; i < divs.size(); i++) {
        divsTexts[i] = divs.get(i).ownText();
    }


In response to comment: if you have nested elements and you want to get own text for each element than you can use jquery multiple selector syntax. Here's an example:

public static void main(String[] args) {
    String html = "<html><head/><body><div class=\"main-content\">" +
            "<div class=\"sub-content\">" +
            "<p>a paragraph <b>with some bold text</b></p>" +
            "Sub content here</div>" +
            "Main content here </div></body></html>";
    Document document = Jsoup.parse(html);
    Elements divs = document.select("div, p, b");
    for (Element div : divs) {
        System.out.println(div.ownText());
    }
}

The code above will parse the following HTML:

<html>
<head />
<body>
<div class="main-content">
<div class="sub-content">
<p>a paragraph <b>with some bold text</b></p>
Sub content here</div>
Main content here</div>
</body>
</html>

and print the following output:

Main content here
Sub content here
a paragraph
with some bold text

这篇关于如何从&lt; div&gt;中提取内容标签java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆