HtmlUnitDriver中的黑名单和白名单URL [英] Blacklist and whitelist URLs in HtmlUnitDriver

查看:247
本文介绍了HtmlUnitDriver中的黑名单和白名单URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将PhantomJS和GhostDriver中的URL列入黑名单非常简单.首先使用处理程序初始化驱动程序:

Blacklisting URLs in PhantomJS and GhostDriver is pretty straightforward. First initialize the driver with a handler:

PhantomJSDriver driver = new PhantomJSDriver();
driver.executePhantomJS(loadFile("/phantomjs/handlers.js"))

并配置处理程序:

this.onResourceRequested = function (requestData, networkRequest) {
    var allowedUrls = [
        /https?:\/\/localhost.*/,
        /https?:\/\/.*\.example.com\/?.*/
    ];
    var disallowedUrls = [
        /https?:\/\/nonono.com.*/
    ];

    function isUrlAllowed(url) {
        function matches(url) {
            return function(re) {
                return re.test(url);
            };
        }
        return allowedUrls.some(matches(url)) && !disallowedUrls.some(matches(url));
    }

    if (!isUrlAllowed(requestData.url)) {
        console.log("Aborting disallowed request (# " + requestData.id + ") to url: '" + requestData.url + "'");
        networkRequest.abort();
    }
};

我还没有找到使用HtmlUnitDriver的好方法.在>如何从HtmlUnit中的特定url过滤javascript中提到的ScriptPreProcessor ,但它使用的是WebClient,而不是HtmlUnitDriver.有什么想法吗?

I haven't found a good way to do this with HtmlUnitDriver. There's the ScriptPreProcessor mentioned in How to filter javascript from specific urls in HtmlUnit, but it uses WebClient, not HtmlUnitDriver. Any ideas?

推荐答案

扩展

Extend HtmlUnitDriver and implement a ScriptPreProcessor (for editing content) and a HttpWebConnection (for allowing/blocking URLs):

public class FilteringHtmlUnitDriver extends HtmlUnitDriver {

    private static final String[] ALLOWED_URLS = {
            "https?://localhost.*",
            "https?://.*\\.yes.yes/?.*",
    };
    private static final String[] DISALLOWED_URLS = {
            "https?://spam.nono.*"
    };

    public FilteringHtmlUnitDriver(DesiredCapabilities capabilities) {
        super(capabilities);
    }

    @Override
    protected WebClient modifyWebClient(WebClient client) {
        WebConnection connection = filteringWebConnection(client);
        ScriptPreProcessor preProcessor = filteringPreProcessor();

        client.setWebConnection(connection);
        client.setScriptPreProcessor(preProcessor);

        return client;
    }

    private ScriptPreProcessor filteringPreProcessor() {
        return (htmlPage, sourceCode, sourceName, lineNumber, htmlElement) -> editContent(sourceCode);
    }

    private String editContent(String sourceCode) {
        return sourceCode.replaceAll("foo", "bar");        }

    private WebConnection filteringWebConnection(WebClient client) {
        return new HttpWebConnection(client) {
            @Override
            public WebResponse getResponse(WebRequest request) throws IOException {
                String url = request.getUrl().toString();
                WebResponse emptyResponse = new WebResponse(
                        new WebResponseData("".getBytes(), SC_OK, "", new ArrayList<>()), request, 0);

                for (String disallowed : DISALLOWED_URLS) {
                    if (url.matches(disallowed)) {
                        return emptyResponse;
                    }
                }
                for (String allowed : ALLOWED_URLS) {
                    if (url.matches(allowed)) {
                        return super.getResponse(request);
                    }
                }
                return emptyResponse;
            }
        };
    }
}

这既可以编辑内容,也可以阻止URL.

This enables both editing of content, and blocking of URLs.

这篇关于HtmlUnitDriver中的黑名单和白名单URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆