JSoup:难以提取单个元素 [英] JSoup: Difficulty extracting a single element

查看:88
本文介绍了JSoup:难以提取单个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的大学编码项目,我的任务是从互联网上获取比特币的实时价值,并将其整合到一个微型比特币程序"中.问题是我很难从某些网站提取比特币的价值.任何帮助都将不胜感激.

For my college coding project, I am tasked with grabbing the live value of bitcoin from the internet and incorporating it into a mini "bitcoin program." The issue is that I am having difficulty extracting the value of bitcoin from certain websites. Any and all help would be greatly appreciated.

我尝试使用不同的网站,但结果却不尽相同.

I have tried using different websites, with mixed results.

    final String url = "https://www.coindesk.com/price/bitcoin";
    try
    {
        Document doc = Jsoup.connect(url).get();
        Element ele = doc.select("span.currency-price").first();
        final String words = ele.text();
        System.out.println(words);
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }

示例2

    final String url = "https://cointelegraph.com/bitcoin-price-index";
    try
    {
        Document doc = Jsoup.connect(url).get();
        Element ele = doc.select("div.price-value").first();
        final String words = ele.text();
        System.out.println(words);
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }

示例1导致java.lang.NullPointerException 在com.mycompany.test.Test.main(Test.java:28)

Example 1 resulted in a java.lang.NullPointerException at com.mycompany.test.Test.main(Test.java:28)

示例2正常运行.

推荐答案

网站https://www.coindesk.com/price/bitcoin在显示内容时严重依赖JavaScript. Jsoup无法执行JavaScript.它只能解析原始HTML文档.
要查看Jsoup看到的内容,请尝试在禁用JavaScript的情况下访问此页面.您会看到页面缺少主要内容.或者,访问此页面并按Ctrl + U在修改JavaScript之前检查页面源.
使用Chrome的调试器(网络"标签),您可以看到它发出了其他AJAX请求,以从以下URL获取JSON中的当前汇率:

Site https://www.coindesk.com/price/bitcoin relies heavily on JavaScript when presenting content. Jsoup can't execute JavaScript. It can only parse raw HTML documents.
To see what Jsoup sees try to visit this page with JavaScript disabled. You'll see the page is missing main content. Alternatively visit this page and press Ctrl+U to check page source before JavaScript modifications.
Using Chrome's debugger (Network tab) you can see it makes additional AJAX requests to get current exchange rates in JSON from this URL: https://production.api.coindesk.com/v1/exchangeRates
Then JavaScript is used to create dynamic HTML elements for this data. It also requests few other URLs to fetch graph data.

这篇关于JSoup:难以提取单个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆