如何正确解码传递给servlet的unicode参数 [英] How do I correctly decode unicode parameters passed to a servlet

查看:88
本文介绍了如何正确解码传递给servlet的unicode参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有:

<a href="http://www.yahoo.com/" target="_yahoo" 
    title="Yahoo!&#8482;" onclick="return gateway(this);">Yahoo!</a>
<script type="text/javascript">
function gateway(lnk) {
    window.open(SERVLET +
        '?external_link=' + encodeURIComponent(lnk.href) +
        '&external_target=' + encodeURIComponent(lnk.target) +
        '&external_title=' + encodeURIComponent(lnk.title));
    return false;
}
</script>

我已确认 external_title 被编码为 Yahoo!%E2%84%A2 并传递给 SERVLET 。如果在 SERVLET 我这样做:

I have confirmed external_title gets encoded as Yahoo!%E2%84%A2 and passed to SERVLET. If in SERVLET I do:

Writer writer = response.getWriter();
writer.write(request.getParameter("external_title"));

我在浏览器中获得 Yahoo!¢。如果我手动将浏览器字符编码切换为UTF-8,它将更改为 Yahoo! TM (这就是我想要的)。

I get Yahoo!â„¢ in the browser. If I manually switch the browser character encoding to UTF-8, it changes to Yahoo!TM (which is what I want).

所以我认为我发送到浏览器的编码是错误的(它是 Content-type:text / html; charset = ISO-8859-1 )。我将 SERVLET 更改为:

So I figured the encoding I was sending to the browser was wrong (it was Content-type: text/html; charset=ISO-8859-1). I changed SERVLET to:

response.setContentType("text/html; charset=utf-8");
Writer writer = response.getWriter();
writer.write(request.getParameter("external_title"));

现在浏览器字符编码为UTF-8,但输出 Yahoo!¢我无法让浏览器呈现正确的字符。

Now the browser character encoding is UTF-8, but it outputs Yahoo!⢠and I can't get the browser to render the correct character at all.

我的问题是:是否有内容的某种组合-type 和/或 new String(request.getParameter(external_title)。getBytes(),UTF-8); 和/或其他会导致 Yahoo! TM 出现在 SERVLET 输出中?

My question is: is there some combination of Content-type and/or new String(request.getParameter("external_title").getBytes(), "UTF-8"); and/or something else that will result in Yahoo!TM appearing in the SERVLET output?

推荐答案

你快到了。 EncodeURIComponent正确编码为UTF-8,这是你今天应该总是在URL中使用的。

You are nearly there. EncodeURIComponent correctly encodes to UTF-8, which is what you should always use in a URL today.

问题是提交的查询字符串在进入的过程中被删除了你的服务器端脚本,因为getParameter()使用ISO-8559-1而不是UTF-8。这源于古代时代之前,网络在UTF-8上确定了URI / IRI,但是Servlet规范尚未更新以匹配现实,或者至少为它提供可靠的支持选项,这是相当可悲的。

The problem is that the submitted query string is getting mutilated on the way into your server-side script, because getParameter() uses ISO-8559-1 instead of UTF-8. This stems from Ancient Times before the web settled on UTF-8 for URI/IRI, but it's rather pathetic that the Servlet spec hasn't been updated to match reality, or at least provide a reliable, supported option for it.

(Servlet 2.3中有request.setCharacterEncoding,但它不影响查询字符串解析,如果之前已经读过一个参数,可能是其他一些框架元素,它根本无法工作。)

(There is request.setCharacterEncoding in Servlet 2.3, but it doesn't affect query string parsing, and if a single parameter has been read before, possibly by some other framework element, it won't work at all.)

因此,您需要使用特定于容器的方法来获取正确的UTF-8,通常涉及server.xml中的内容。这完全适合分发应该在任何地方工作的Web应用程序。对于Tomcat,请参阅 http://wiki.apache.org/tomcat/FAQ/CharacterEncoding 还有URIEncoding之间的区别是什么? Tomcat,编码过滤器和request.setCharacterEncoding

So you need to futz around with container-specific methods to get proper UTF-8, often involving stuff in server.xml. This totally sucks for distributing web apps that should work anywhere. For Tomcat see http://wiki.apache.org/tomcat/FAQ/CharacterEncoding and also What's the difference between "URIEncoding" of Tomcat, Encoding Filter and request.setCharacterEncoding.

这篇关于如何正确解码传递给servlet的unicode参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆