硒2:检测链接目标的内容类型 [英] Selenium 2: Detect content type of link destinations

查看:89
本文介绍了硒2:检测链接目标的内容类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Selenium 2 Java API与网页进行交互.我的问题是:如何检测链接目标的内容类型?

I am using the Selenium 2 Java API to interact with web pages. My question is: How can i detect the content type of link destinations?

基本上,这是背景知识:在单击链接之前,我想确保响应是HTML文件.如果没有,我需要以其他方式处理它.因此,假设有一个PDF文件的下载链接.应用程序应直接读取该URL的内容,而不是在浏览器中打开它.

Basically, this is the background: Before clicking a link, i want to be sure that the response is an HTML file. If not, i need to handle it in another way. So, let's say there is a download link for a PDF file. The application should directly read the contents of that URL instead of opening it in the browser.

目标是要有一个能够自动知道当前位置是HTML,PDF,XML或其他使用适当的解析器从文档中提取有用信息的应用程序.

The goal is to have an application which automatically knows wheather the current location is an HTML, PDF, XML or whatever to use appropriate parsers to extract useful information out of the documents.

更新

增加赏金:将奖励给最佳解决方案,使我能够获得给定URL的内容类型.

Added bounty: Will reward it to the best solution which allows me to get the content type of a given URL.

推荐答案

正如Jochen所建议的那样,在不下载内容的情况下获取Content-type的方法是HTTP HEAD,硒Web驱动程序似乎没有提供类似的功能.那.您将不得不找到另一个库来帮助您获取URL的内容类型.

As Jochen suggests, the way to get the Content-type without also downloading the content is HTTP HEAD, and the selenium webdrivers does not seem to offer functionality like that. You'll have to find another library to help you with fetching the content type of an url.

可以做到这一点的Java库是 Apache HttpComponents ,尤其是

A Java library that can do this is Apache HttpComponents, especially HttpClient.

(以下代码未经测试)

HttpClient httpclient = new DefaultHttpClient();
HttpHead httphead = new HttpHead("http://foo/bar");
HttpResponse response = httpclient.execute(httphead);
BasicHeader contenttypeheader = response.getFirstHeader("Content-Type");

System.out.println(contenttypeheader);

该项目发布了适用于HttpClient的JavaDoc . HttpClient接口包含一个很好的例子.

The project publishes JavaDoc for HttpClient, the documentation for the HttpClient interface contains a nice example.

这篇关于硒2:检测链接目标的内容类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆