如何使用Jsoup从嵌套范围中获取文本? [英] How to get text from nested span using Jsoup?

查看:241
本文介绍了如何使用Jsoup从嵌套范围中获取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取跨度文本

I'm trying to get the text in the span

在下面使用此代码.但是输出的行为似乎不存在嵌套的跨度

using this code below. However the output is behaving as if the nested spans don't exist

            Elements tags = document.select("div[id=tags]"); 

            for (Element tag:tags){


                Elements child_tags = tag.getElementsByTag("class");  

                String key = tag.html();
                System.out.println(key); //only as a test

                for (Element child_tag:child_tags){
                    System.out.println("\t" + child_tag.text());

                }

我的输出是

      <hr />Tags: 
      <span id="category"></span> 
      <span id="voteSelector" class="initially_hidden"> <br /> </span>      

推荐答案

假设您正在尝试

现在,使用Jsoup,您将获得浏览器中呈现为源代码的任何数据,要确认,您可以在浏览器中按CTRL+U,这将打开一个新窗口,其中将显示Jsoup的实际内容为显示.现在出现您的问题了,您尝试检索的自身部分未出现在浏览器源代码中,请按CTRL+U键进行检查.

Now, Using Jsoup you will get the data whatever is rendered as a source code in the browser,for confirmation you can press CTRL+U in browser which will open up a new window where the actual contents which Jsoup will get will be displayed. Now coming to your questions the part which you are trying to retrieve itself is not present in the browser source code check that by pressing CTRL+U.

如果使用JAVASCRIPT呈现内容,则JSOUP将看不到这些内容,因此您必须使用其他可运行javascript并提供详细信息的东西.

If the contents are rendered using JAVASCRIPT those will not be visible to JSOUP and hence you have to use something else which will run the javascript and provide you the details.

JSoup无法运行Javascript,并且不是浏览器.

JSoup does not run Javascript and is not a browser.

编辑

使用可以解决问题.下面的工作代码可获取url的确切源代码以及您要查找的所需数据:

There is a turnaround using SELENIUM. Below is the working code to get the exact source code of the url and the required data which you are looking for:

import java.io.IOException;
import java.io.PrintWriter;

import org.json.simple.parser.ParseException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class JsoupDummy {
 public static void main(String[] args) throws IOException, ParseException {
    System.setProperty("webdriver.gecko.driver", "D:\\thirdPartyApis\\geckodriver-v0.19.1-win32\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();

    try {
        driver.get("https://chesstempo.com/chess-problems/15");
        Document doc = Jsoup.parse(driver.getPageSource());
        Elements elements = doc.select("span.ct-active-tag");
        for (Element element:elements){
             System.out.println(element.html());
        }

    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        /*write.flush();
        write.close();*/
        driver.quit();

    }
}
}

您需要Selenium Web驱动程序 Selenium Web Driver ,它可以模拟浏览器的行为并允许您呈现脚本编写的html内容.

You need selenium web driver Selenium Web Driver which simulates the browser behaviour and allows you to render the html content written by scripts as well.

这篇关于如何使用Jsoup从嵌套范围中获取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆