使用PrintWriter,我在浏览器中收到中文垃圾字符 [英] Using PrintWriter, I am getting Chinese junk characters in browser

查看:134
本文介绍了使用PrintWriter,我在浏览器中收到中文垃圾字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 PrintWriter ,如下所示在浏览器中输出:

  PrintWriter pw = response.getwriter(); 
StringBuffer sb = getTextFromDatabase();
pw.print(sb);

但是,这会打印以下中文垃圾字符:


格㸳潃浭湥獴⼼㍨〜琼扡敬㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾整⁤湏>〱㈭ⴷ〲〱ㄠ㨴㌰㔺਱㔺਱>教桳慷瑮丠祡瑮丠祡⠊湹祡⠊湹祡捀獩潣挮浯਩捀獩潣挮浯਩✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾捧捧汯牯䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏>〱㈭ⴷ〲〱ㄠ㨴㐰ㄺ਱〱㈭ⴷ〲〱ㄠ㨴㐰ㄺ਱〱㈭ⴷ〲〱ㄠ㨴㐰ㄺ਱捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲捀獩潣挮浯਩敶祲朠浔㱤琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾整整⁤湏>〱㈭ⴷ〲〱ㄠ㨴㜱㌺਼਼祂畯⼠⼠⼠畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯畯⁲潃浭湥㱴栯㸳㰠潦浲愠琐潩㵮䌢浯敭瑮挚牥牥牥瑥敭桴敭桴挚∽敧≴∽敧≴∽敧≴浡㵥挢浯敭瑮潆浲浡㵥挢浯敭瑮潆浲浡㵥挢浯敭瑮潆浲浡㵥挢浯敭瑮潆浲慖楬慖楬慤整潆浲⤨∻〜琼扡敬†眠摩桴∽〳∰栠楥桧㵴㌢〰㸢ठ琼㹲琼㹤氼扡汥映牯∽虑敭㸢潃浭湥㩴猼慰呤汃獡≳⨾⼼灳湡㰾汃獡≳⨾⼼灳湡㰾扡汥㰾潣⁡敭∽潣瑮湥≴椠㵤挢浯敭瑮硔䅴慑慑獬∽㠲•潲犷∽:㸠⼼整瑸牡慥㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦㵲沨浡㵲沨浡举浡㩥猼浡㩥猼浡㩥猼汃獡≳⨾⼼灳湡㰾氯扡汥㰾保护㸯椼灮瑵椠㵤沨浡琠灹㵥琢硥≴硥≴硥≴浡㵥沨浡浡㵥沨浡浡㵥沨浡浡㵥沨浡敭敭敭敭敭敭敭敭敭敭楳∽㘳⼢㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦攒업慭汩㸢ⵅ愠汩㰺灳湡慭汩㸢ⵅ愠汩㰺灳湡慬獳∽慭摮䍔慬獳㸢㰪猯异〗⁴畲㵥∢洠硡琢硥≴硥≴攒浡㵥攒慭汩慭汩汣獡㵳攒慭汩

blockquote>

我试图使用 String 而不是 StringBuffer 没有帮助。我也尝试设置内容类型头如下

  response.setContentType(text / html; charset = UTF- ); 

,但这并没有帮助。



在数据库中没有数据的问题,因为我使用相同的数据为2个不同的目的。在一个我得到正确的输出,但在其他我得到上述垃圾。我已经使用上面的代码在JSP使用scriptlet。我还为JSP提供了内容类型。

解决方案

获取汉字为 Mojibake 表示您不正确地将UTF-16LE数据显示为UTF-8。 UTF16-LE以4字节存储每个字符。在UTF-8中,4字节的面板通常包含CJK(中文/日语/韩语)字符。



要解决这个问题,您需要将数据显示为UTF -16LE或将数据作为UTF-8从头开始存储在DB中。因为你试图显示为UTF-8,我认为你的数据库必须重新配置/转换为使用UTF-8而不是UTF-16LE。






无关的具体问题,在数据库中存储HTML(这就是那些字符最初表示的)是真的一个坏主意;)这是原始内容:

 < h3>注释< / h3& < table>< tr bgcolor ='#E7E7EF'>< td>发布日期:10-27-2010 14:03:51 
,By:Yeshwant Nayak
(ynayak @ cisco。 com)
优秀,By:Yeshwant Nayak
(ynayak@cisco.com)
非常好< / td> ; / tr>< tr bgcolor ='#E7E7EF'>< td>< / td>< / tr>< tr bgcolor ='#E7E7EF'>< td& -2010 14:17:36
,By:Yeshwant Nayak
(ynayak@cisco.com)
这是用来测试< / td>< / tr>< / table> ; br / < h3>张贴您的评论< / h3> < form action =CommentsServletmethod =getname =commentFormonsubmit =return ValidateForm();> < table width =300height =300> < tr>< td>< label for =name>注释:< span class =mandTClass> *< / span>< / label>< br />< textarea name =contentid =commentTxtAreaclass =textarea largecols =28rows =6>< / textarea>< / td>< / tr>< tr>< td ;< label for =name>名称:< span class =mandTClass> *< / span>< / label>< br />< input id =nametype = textname =nameclass =namevalue =maxlength =255size =36/>< / td>< / tr>< tr>< td& label for =email>电子邮件:< span class =mandTClass> *< / span>< / label>< br />< input id =emailtype = textname =emailclass =emailvalue =maxlength =255size =36/>< / td>< / tr>< tr>< td> type =submitname =postvalue =Post/>< / td>< / tr>< / table>< / form
pre>

以下是如何将此错误编码的中文转换回正常字符的方法:

  String incorrect =格㸳潃浭湥獴⼼㍨〜琼扡敬㰾牴戠捧汯; 
String original = new String(incorrect.getBytes(UTF-16LE),UTF-8);

请注意,这不应该用作解决方案!它只是作为问题根本原因的证据。


I am using PrintWriter as follows to get the output in the browser:

PrintWriter pw = response.getwriter();
StringBuffer sb = getTextFromDatabase();
pw.print(sb);

However, this prints the following Chinese junk characters:

格㸳潃浭湥獴⼼㍨‾琼扡敬㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㌰㔺਱‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩硅散汬湥㱴琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㐰ㄺ਱‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩敶祲朠潯㱤琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴㰾琯㹤⼼牴㰾牴戠捧汯牯✽䔣䔷䔷❆㰾摴倾獯整⁤湏›〱㈭ⴷ〲〱ㄠ㨴㜱㌺ਸ਼‬祂›教桳慷瑮丠祡歡⠊湹祡歡捀獩潣挮浯਩桔獩椠⁳潴琠獥㱴琯㹤⼼牴㰾琯扡敬㰾牢⼠‾格㸳潐瑳夠畯⁲潃浭湥㱴栯㸳㰠潦浲愠瑣潩㵮䌢浯敭瑮即牥汶瑥•敭桴摯∽敧≴渠浡㵥挢浯敭瑮潆浲•湯畳浢瑩∽爠瑥牵慖楬慤整潆浲⤨∻‾琼扡敬†眠摩桴∽〳∰栠楥桧㵴㌢〰㸢ठ琼㹲琼㹤氼扡汥映牯∽慮敭㸢潃浭湥㩴猼慰汣獡㵳洢湡呤汃獡≳⨾⼼灳湡㰾氯扡汥㰾牢㸯琼硥慴敲⁡慮敭∽潣瑮湥≴椠㵤挢浯敭瑮硔䅴敲≡挠慬獳∽整瑸牡慥氠牡敧•潣獬∽㠲•潲獷∽∶㸠⼼整瑸牡慥㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦㵲渢浡≥举浡㩥猼慰汣獡㵳洢湡呤汃獡≳⨾⼼灳湡㰾氯扡汥㰾牢㸯椼灮瑵椠㵤渢浡≥琠灹㵥琢硥≴渠浡㵥渢浡≥挠慬獳∽慮敭•慶畬㵥∢洠硡敬杮桴∽㔲∵†楳敺∽㘳⼢㰾琯㹤⼼牴㰾牴㰾摴㰾慬敢潦㵲攢慭汩㸢ⵅ慍汩㰺灳湡挠慬獳∽慭摮䍔慬獳㸢㰪猯慰㹮⼼慬敢㹬戼⽲㰾湩異⁴摩∽浥楡≬琠灹㵥琢硥≴渠浡㵥攢慭汩•汣獡㵳攢慭汩•慶畬㵥∢洠硡敬杮桴∽㔲∵†楳敺∽㘳⼢㰾琯㹤⼼牴㰾牴㰾摴㰾湩異⁴琠灹㵥猢扵業≴†慮敭∽潰瑳•慶畬㵥倢獯≴㸯⼼摴㰾琯㹲⼼慴汢㹥⼼潦浲

I tried to use String instead of StringBuffer, but that didn't help. I also tried to set the content type header as follows

 response.setContentType("text/html;charset=UTF-8");

before getting the response writer, but that did also not help.

In the DB there are no issues with the data as I have used the same data for 2 different purposes. In one I get correct output, but in other I get the above junk. I have used the above code in JSP using scriptlets. I have also given content type for the JSP.

解决方案

Getting Chinese characters as Mojibake indicates that you're incorrectly showing UTF-16LE data as UTF-8. UTF16-LE stores each character in 4 bytes. In UTF-8, the 4-byte panels contains usually CJK (Chinese/Japanese/Korean) characters.

To fix this, you need to either show the data as UTF-16LE or to have stored the data in the DB as UTF-8 from the beginning on. Since you're attempting to display them as UTF-8, I think that your DB has to be reconfigured/converted to use UTF-8 instead of UTF-16LE.


Unrelated to the concrete problem, storing HTML (that was what those characters originally represent) in a database is really a bad idea ;) This was the original content:

<h3>Comments</h3> <table><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:03:51
, By: Yeshwant Nayak
(ynayak@cisco.com)
Excellent</td></tr><tr bgcolor='#E7E7EF'><td></td></tr><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:04:11
, By: Yeshwant Nayak
(ynayak@cisco.com)
very good</td></tr><tr bgcolor='#E7E7EF'><td></td></tr><tr bgcolor='#E7E7EF'><td>Posted On: 10-27-2010 14:17:36
, By: Yeshwant Nayak
(ynayak@cisco.com)
This is to test</td></tr></table><br /> <h3>Post Your Comment</h3> <form action="CommentsServlet" method="get" name="commentForm" onsubmit=" return ValidateForm();"> <table   width="300" height="300">    <tr><td><label for="name">Comment:<span class="mandTClass">*</span></label><br/><textarea name="content" id="commentTxtArea" class="textarea large" cols="28" rows="6" ></textarea></td></tr><tr><td><label for="name">Name:<span class="mandTClass">*</span></label><br/><input id="name" type="text" name="name" class="name" value="" maxlength="255"  size="36"/></td></tr><tr><td><label for="email">E-Mail:<span class="mandTClass">*</span></label><br/><input id="email" type="text" name="email" class="email" value="" maxlength="255"  size="36"/></td></tr><tr><td><input  type="submit"  name="post" value="Post"/></td></tr></table></form

Here's how you can turn this incorrectly encoded Chinese back to normal characters:

String incorrect = "格㸳潃浭湥獴⼼㍨‾琼扡敬㰾牴戠捧汯";
String original = new String(incorrect.getBytes("UTF-16LE"), "UTF-8");

Note that this should not be used as solution! It was just posted as an evidence of the root cause of the problem.

这篇关于使用PrintWriter,我在浏览器中收到中文垃圾字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆