使用Java从html页面提取单个值: [英] Using java to extract a single value from an html page:

查看:525
本文介绍了使用Java从html页面提取单个值:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在继续从事一段时间的项目,而我一直在努力从网站上获取一些数据.该网站有一个iframe,可从未知来源提取一些数据.数据在iframe中的代码中是这样的:

I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:

<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>

上面有很多其他废话,但是这个div id/标签是完全唯一的,在代码中的其他任何地方都没有使用.

There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.

推荐答案

jsoup 可能就是您想要的,它表现出色从HTML文档中提取数据.

jsoup is probably what you want, it excels at extracting data from an HTML document.

有许多示例显示了如何使用API​​: http://jsoup. org/cookbook/extracting-data/selector-syntax

There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax

该过程将分为两个步骤:

The process will be in two steps:

  • 解析页面并找到iframe的网址
  • 解析iframe的内容并提取所需的信息

代码如下:

 // let's find the iframe
 Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
 Elements elements = document.select("iframe");
 Element iframe = elements.first();

 // now load the iframe
 URL iframeUrl = new URL(iframe.absUrl("src"));
 document = Jsoup.parse(iframeUrl, 15000);

 // extract the div
 Element div = document.getElementById("number_forecast");

这篇关于使用Java从html页面提取单个值:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆