如何将URL编码为“可浏览"? [英] How to encode a URL to be "browserable"?

查看:58
本文介绍了如何将URL编码为“可浏览"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有任何方法可以解析这样的URL:

I want to know if there is any way to parse an URL like this:

https://www.mysite.com/lot/of/unpleasant/folders/and/my/url with spaces &"others".xls

进入

https://www.mysite.com/lot/of/unpleasant/folders/and/my/url%20with%20spaces%20&%22others%22.xls

类似于Firefox进行的URL重写,仅粘贴以前的URL,然后将其发送到服务器(除非有这样的站点,否则没有响应),然后从导航栏中复制URL并将其粘贴到其他位置.

Similar to the URL rewriting that Firefox does when just pasting the former url, sending it to the server (without response unless you have a site like this) and then copying the URL from the navigation bar and pasting it somewhere else.

使用 URLEncoder#encode 给我这个(不需要的)输出:

Using URLEncoder#encode gives me this (undesired) output:

https%3A%2F%2Fwww.mysite.com%2Flot%2Fof%2Funpleasant%2Ffolders%2Fand%2Fmy%2Furl+with+spaces+%26%22others%22.xls

不幸的是,我收到问题开头所示的字符串,因此直接使用 URLEncoder#encode 无效.

Sadly, I receive a String as shown at the beginning of the question so using URLEncoder#encode directly doesn't work.

我天真地尝试过:

String evilUrl = "https://www.mysite.com/lot/of/unpleasant/folders/and/my/url with spaces &\"others\".xls";
URI uri = null;
String[] urlParts = evilUrl.split("://");
String scheme = urlParts[0];
urlParts = urlParts[1].split("/");
String host = urlParts[0];
StringBuilder sb = new StringBuilder('/');
for (int i = 1; i < urlParts.length; i++) {
    sb.append('/');
    sb.append(urlParts[i]);
}
uri = new URI(scheme, host, sb.toString(), null);
System.out.println(uri.toASCIIString());

并给出以下(更好的)输出:

And gives this (better) output:

https://www.mysite.com/lot/of/unpleasant/folders/and/my/url%20with%20spaces%20&%22others%22.xls

但是我不确定是否有针对此问题的开箱即用的解决方案,并且我一无所获,或者我是否可以依靠这段代码几乎可以成功地解决我的问题

But I'm not sure if there is an out-of-the-box solution there for this problem and I'm breaking my head for nothing or if I can rely that this piece of code can almost successfully solve my problem.

顺便说一句,我已经访问了有关此主题的一些资源:

By the way, I already visited some resources on this topic:

推荐答案

这类url的问题在于它们是部分编码的,如果您尝试使用开箱即用的编码器,它将始终对整个字符串,所以我想您使用自定义编码器的方法是正确的.您的代码还可以,您只需要添加一些验证,例如,如果"evil url"不随协议部分一起提供(即没有"https://"),除非您非常确定,否则该怎么办?它永远不会发生.

The problem with that sort of urls is that they are partially encoded, if you try to use an out-of-the-box encoder it will always encode the whole string, so I guess that your approach of using a custom encoder is correct. Your code is OK, you would just need to add some validations like, for instance, what if the "evil url" doesn't come with the protocol part (i. e. without the "https://") unless you're pretty sure it will never happen.

我有一些空闲时间,所以我做了一个替代的自定义编码器,我遵循的策略是解析URL中不允许的字符并仅对那些字符进行编码,而不是尝试对整个内容进行重新编码:

I have some spare time so I did an alternative custom encoder, the strategy I follow is to parse for chars that are not allowed in an URL and encode only those, rather than trying to re-encode the whole thing:

private static String encodeSemiEncoded(String semiEncondedUrl) {
    final String ALLOWED_CHAR = "!*'();:@&=+$,/?#[]-_.~";
    StringBuilder encoded = new StringBuilder();
    for(char ch: semiEncondedUrl.toCharArray()) {
        boolean shouldEncode = ALLOWED_CHAR.indexOf(ch) == -1 && !Character.isLetterOrDigit(ch) || ch > 127;
        if(shouldEncode) {
            encoded.append(String.format("%%%02X", (int)ch));
        } else {
            encoded.append(ch);
        }
    }
    return encoded.toString();
}

希望这会有所帮助

这篇关于如何将URL编码为“可浏览"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆