Jsoup css选择器代码（包含xpath代码） [英] Jsoup css selector code (xpath code included)

查看：141 发布时间：2017/2/22 20:34:54 xpath css-selectors html-parsing jsoup tag-soup

本文介绍了Jsoup css选择器代码（包含xpath代码）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用jsoup解析下面的HTML，但无法获得正确的语法。

 < div class =info>< strong>第1行：< / strong>一些文本1< br> 
< b>一些文字2< / b>< br> 
< strong>第3行：< / strong>一些文本3< br> 
< / div>

我需要捕获一些文本1，一些文本2和一些文本3在三个不同的变量。 / p>

我有第一行的xpath（对于第3行应该是类似的），但无法找到等效的css选择器。

  // div [@ class ='info'] / strong [1] / following :: text（）
  pre> 
 
 请帮助。
 
 
 在一个单独的我有几百个html文件，需要解析和提取数据从他们存储在数据库中。是Jsoup的最佳选择吗？
 
 
 我试图重新打开这个问题，因为我还没有找到解决方案。请帮助。
解决方案
它看起来像Jsoup不能处理从混合内容的元素中获取文本。以下是一个使用您制定的XPath的解决方案，它使用 XOM 和 TagSoup ：
  import java.io.IOException; 
 
 import nu.xom.Builder; 
 import nu.xom.Document; 
 import nu.xom.Nodes; 
 import nu.xom.ParsingException; 
 import nu.xom.ValidityException; 
 import nu.xom.XPathContext; 
 
 import org.ccil.cowan.tagsoup.Parser; 
 import org.xml.sax.SAXException; 
 
 public class HtmlTest {
 public static void main（final String [] args）throws SAXException，ValidityException，ParsingException，IOException {
 final String html =< div class = \info \>< strong>第1行：< / strong>一些文字1< br>< b>部分文字2< / b>< br>< strong& ; / strong>一些文字3< br>< / div>; 
 final Parser parser = new Parser（）; 
 final Builder builder = new Builder（parser）; 
 final文档文档= builder.build（html，null）; 
 final nu.xom.Element root = document.getRootElement（）; 
 final nodes textElements = root.query（// xhtml：div [@ class ='info'] / xhtml：strong [1] / following :: text（），new XPathContext（xhtml，root .getNamespaceURI（）））; 
 for（int textNumber = 0; textNumber< textElements.size（）; ++ textNumber）{
 System.out.println（textElements.get（textNumber）.toXML（））; 
} 
} 
} 
  
 > 
 
 
 一些文本1 
一些文本2 
第3行：
一些文本3 
  
不知道你想做什么的更多细节，但我不知道这是否是你想要的。
 
I am trying to parse below HTML using jsoup but not able to get the right syntax for it.
<div class="info"><strong>Line 1:</strong> some text 1<br>
  <b>some text 2</b><br>
  <strong>Line 3:</strong> some text 3<br>
</div>
I need to capture some text 1, some text 2 and some text 3 in three different variables.

I have the xpath for first line (which should be similar for line 3) but unable to work out the equivalent css selector.
//div[@class='info']/strong[1]/following::text()
Please help.

On a separate I have few hundred html files and need to parse and extract data from them to store in a database. Is Jsoup best choice for this?

I am trying to re-open this question as I still haven't found the solution. Please help.
 解决方案 
It really looks like Jsoup can't handle getting text out of an element with mixed content. Here is a solution that uses the XPath you formulated that uses XOM and TagSoup:
import java.io.IOException;

import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Nodes;
import nu.xom.ParsingException;
import nu.xom.ValidityException;
import nu.xom.XPathContext;

import org.ccil.cowan.tagsoup.Parser;
import org.xml.sax.SAXException;

public class HtmlTest {
    public static void main(final String[] args) throws SAXException, ValidityException, ParsingException, IOException {
        final String html = "<div class=\"info\"><strong>Line 1:</strong> some text 1<br><b>some text 2</b><br><strong>Line 3:</strong> some text 3<br></div>";
        final Parser parser = new Parser();
        final Builder builder = new Builder(parser);
        final Document document = builder.build(html, null);
        final nu.xom.Element root = document.getRootElement();
        final Nodes textElements = root.query("//xhtml:div[@class='info']/xhtml:strong[1]/following::text()", new XPathContext("xhtml", root.getNamespaceURI()));
        for (int textNumber = 0; textNumber < textElements.size(); ++textNumber) {
            System.out.println(textElements.get(textNumber).toXML());
        }
    }
}
This outputs:
 some text 1
some text 2
Line 3:
 some text 3
Without knowing more specifics of what you're trying to do though, I'm not sure if this is exactly what you want.

                        这篇关于Jsoup css选择器代码（包含xpath代码）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Jsoup css选择器代码（包含xpath代码） [英] Jsoup css selector code (xpath code included)

问题描述

相关文章

HTML/CSS最新文章

热门教程

热门工具

登录关闭

Jsoup css选择器代码（包含xpath代码） [英] Jsoup css selector code (xpath code included)

问题描述

相关文章

HTML/CSS最新文章

热门教程

热门工具

登录 关闭

登录关闭