如何正确解码传递给 servlet 的 unicode 参数 [英] How do I correctly decode unicode parameters passed to a servlet

查看:12
本文介绍了如何正确解码传递给 servlet 的 unicode 参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有:

<a href="http://www.yahoo.com/" target="_yahoo" 
    title="Yahoo!&#8482;" onclick="return gateway(this);">Yahoo!</a>
<script type="text/javascript">
function gateway(lnk) {
    window.open(SERVLET +
        '?external_link=' + encodeURIComponent(lnk.href) +
        '&external_target=' + encodeURIComponent(lnk.target) +
        '&external_title=' + encodeURIComponent(lnk.title));
    return false;
}
</script>

我已经确认 external_title 被编码为 Yahoo!%E2%84%A2 并传递给 SERVLET.如果在 SERVLET 我做:

I have confirmed external_title gets encoded as Yahoo!%E2%84%A2 and passed to SERVLET. If in SERVLET I do:

Writer writer = response.getWriter();
writer.write(request.getParameter("external_title"));

我在浏览器中看到 Yahoo!™.如果我手动将浏览器字符编码切换为 UTF-8,它会更改为 Yahoo!TM(这正是我想要的).

I get Yahoo!â„¢ in the browser. If I manually switch the browser character encoding to UTF-8, it changes to Yahoo!TM (which is what I want).

所以我认为我发送到浏览器的编码是错误的(它是 Content-type: text/html; charset=ISO-8859-1).我将 SERVLET 改为:

So I figured the encoding I was sending to the browser was wrong (it was Content-type: text/html; charset=ISO-8859-1). I changed SERVLET to:

response.setContentType("text/html; charset=utf-8");
Writer writer = response.getWriter();
writer.write(request.getParameter("external_title"));

现在浏览器字符编码是 UTF-8,但它输出 Yahoo!™ 并且我根本无法让浏览器呈现正确的字符.

Now the browser character encoding is UTF-8, but it outputs Yahoo!⢠and I can't get the browser to render the correct character at all.

我的问题是:是否有 Content-type 和/或 new String(request.getParameter("external_title").getBytes(), "UTF-8") 的某种组合; 和/或其他会导致 Yahoo!TM 出现在 SERVLET 输出中的东西?

My question is: is there some combination of Content-type and/or new String(request.getParameter("external_title").getBytes(), "UTF-8"); and/or something else that will result in Yahoo!TM appearing in the SERVLET output?

推荐答案

您就快到了.EncodeURIComponent 正确编码为 UTF-8,这是您今天在 URL 中应该始终使用的.

You are nearly there. EncodeURIComponent correctly encodes to UTF-8, which is what you should always use in a URL today.

问题是提交的查询字符串在进入服务器端脚本的过程中被破坏了,因为 getParameter() 使用 ISO-8559-1 而不是 UTF-8.这源于远古时代,在网络为 URI/IRI 确定 UTF-8 之前,但令人遗憾的是,Servlet 规范尚未更新以符合现实,或者至少为它提供可靠的、受支持的选项.

The problem is that the submitted query string is getting mutilated on the way into your server-side script, because getParameter() uses ISO-8559-1 instead of UTF-8. This stems from Ancient Times before the web settled on UTF-8 for URI/IRI, but it's rather pathetic that the Servlet spec hasn't been updated to match reality, or at least provide a reliable, supported option for it.

(Servlet 2.3 中有 request.setCharacterEncoding,但不影响查询字符串解析,如果之前读取过单个参数,可能是其他框架元素读取的,则根本不起作用.)

(There is request.setCharacterEncoding in Servlet 2.3, but it doesn't affect query string parsing, and if a single parameter has been read before, possibly by some other framework element, it won't work at all.)

因此,您需要使用特定于容器的方法来获得正确的 UTF-8,这通常涉及 server.xml 中的内容.这对于分发应该可以在任何地方工作的网络应用程序来说非常糟糕.对于 Tomcat,请参阅 https://cwiki.apache.org/confluence/display/TOMCAT/Character+EncodingURIEncoding"和URIEncoding"有什么区别?Tomcat 的编码过滤器和 request.setCharacterEncoding.

So you need to futz around with container-specific methods to get proper UTF-8, often involving stuff in server.xml. This totally sucks for distributing web apps that should work anywhere. For Tomcat see https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding and also What's the difference between "URIEncoding" of Tomcat, Encoding Filter and request.setCharacterEncoding.

这篇关于如何正确解码传递给 servlet 的 unicode 参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆