如何对URL进行编码以避免在Java中出现特殊字符? [英] How to encode URL to avoid special characters in Java?

查看:936
本文介绍了如何对URL进行编码以避免在Java中出现特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Java代码对URL进行编码,以避免使用特殊字符(例如空格,%和& ...等

解决方案

URL的构造很棘手,因为URL的不同部分对允许使用哪些字符有不同的规定:例如,加号保留在a的查询组件中URL,因为它表示一个空格,但是在URL的路径部分中,加号没有特殊含义,并且空格编码为%20".

RFC 2396 解释(在2.4.2节中),完整的URL始终在其URL中编码形式:您采用各个组成部分(方案,权限,路径等)的字符串,根据其自己的规则对每个元素进行编码,然后将它们组合为完整的URL字符串.尝试构建完整的未编码URL字符串然后对其进行单独编码会导致一些细微的错误,例如路径中的空格被错误地更改为加号(符合RFC的服务器会将其解释为真实的加号,而不是编码的空格).

在Java中,构建URL的正确方法是使用URLEncoder传递,这是不正确的.这样做会导致问题(尤其是前面提到的有关路径中的空格和加号的问题).

i need java code to encode URL to avoid special characters such as spaces and % and & ...etc

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".

RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).

In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.

Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

这篇关于如何对URL进行编码以避免在Java中出现特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆