使用JSoup提取表数据 [英] Extracting Table Data using JSoup

查看:200
本文介绍了使用JSoup提取表数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用JSoup从表中提取财务信息.我已经审查了类似的问题,并可以使他们的示例工作(以下两个:

I'm trying to extract financial information from a table using JSoup. I've reviewed similar questions and can get their examples to work (here are two:

使用Jsoup提取数据

使用JSoup提取HTML表内容).

我不确定为什么代码不能在我的网址.

I'm not sure why the code doesn't work on my URL.

以下是3种不同的尝试.任何帮助将不胜感激.

Below are 3 different attempts. Any help would be appreciated.

String s = "http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=usa&culture=en-US";

//Attempt 1
try {
    Document doc = Jsoup.connect("http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=USA&culture=en_US").get();

    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
} 
catch (IOException ex) {
    ex.printStackTrace();
}

// Attempt 2
try {
    Document doc = Jsoup.connect(s).get(); 
    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            for (int i = 0; i < tds.size(); i++) {
                System.out.println(tds.get(i).text());
            }
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

//Attempt 3
try {
    Document doc = Jsoup.connect(s).get(); 
    Elements tableElements = doc.select("table#currentValuationTable.r_table1.text2");

    Elements tableRowElements = tableElements.select(":not(thead) tr");

    for (int i = 0; i < tableRowElements.size(); i++) {
        Element row = tableRowElements.get(i);
        System.out.println("row");
        Elements rowItems = row.select("td");
        for (int j = 0; j < rowItems.size(); j++) {
            System.out.println(rowItems.get(j).text());
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

推荐答案

Psherno提供的答案:

Answer provided by Psherno:

打印可以从页面读取的文档(使用System.out.println(doc);).某件事告诉我,您的问题可能与您要查找的HTML内容是由浏览器由JavaScript动态添加的事实有关,Jsoup无法做到这一点,因为它不支持JavaScript.在这种情况下,您应该使用功能更强大的工具,例如Web驱动程序(例如Selenium).

Print what Document was able to read from page (use System.out.println(doc);). Something tells me that your problem may be related with fact that HTML content you are looking for is dynamically added by JavaScript by browser, which Jsoup can't do since it doesn't have JavaScript support. In that case you should use more powerful tool like web driver (like Selenium).

这篇关于使用JSoup提取表数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆