Jsoup链接提取 [英] Jsoup links extraction

查看:86
本文介绍了Jsoup链接提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我正在尝试从aol提取所有锚链接,但是它不起作用.相同的代码与yahoo bing一起使用.问题是什么问题

hello guys I am trying to extract all the anchor links from aol but it is not working. The same code is working with yahoo bing. The question is what would be the problem

Document document5 = Jsoup.connect("www.aol.com").get();
Elements links5 = document5.select("a");

for (Element link5 : links5) {
    out.println(link5.attr("href"));
}

推荐答案

根据对您之前的问题的评论:

即使在我指定了协议之后...也只有google和aol无法正常工作,同样也正在与yahoo,bing并询问....我的项目是实现元搜索引擎....我是能够从Yahoo,Bing和Ask中提取链接...但是Google和AOL无法使用这些链接...这可能是原因.. ??

他们阻止了您的请求,因为您充当的机器人/窃贼可能违反了他们的服务条款.他们的网站经常被要求访问,他们不想不必要地将带宽浪费在实际上只需要响应的一小部分的机器人/抓取者身上.

They've blocked your request because you're acting as a robot/leecher which may violate their terms of service. Their websites are very frequently requested and they don't want to unnecessarily waste their bandwidth to robots/leechers which actually only need a small part of the response.

使用其公共Web服务API而不是解析整个网站的HTML.对于Google,例如" Google自定义搜索API ".其他搜索引擎提供商也提供类似的Web服务.请注意,这些Web服务不会返回HTML肿的HTML,而是紧凑的JSON或XML数据,使用JSON/XML解析器更容易解析/提取.

Use their public web service APIs instead of parsing the HTML of the entire website. For Google, that's for example "Google Custom Search API". Other search engine providers offer similar web services. Note that those web services doesn't return bloated HTML, but compact JSON or XML data which is much easier to parse/extract using JSON/XML parsers.

这篇关于Jsoup链接提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆