jsp utf编码 [英] jsp utf encoding

查看:143
本文介绍了jsp utf编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难想出如何处理这个问题:



我正在为意大利大学开发一个网络工具,我必须显示带有重音的单词(如è,ù,...);有时我从PostgreSql表(UTF8编码)中获取这些单词,但是我主要是从文件中读取长的段落。这些文件被编码为utf-8 xml,并在Smultron或任何utf-8编辑器中显示出好处(他们在python旧文件中创建了解析,其中包含诸如& egrave; 而不是è)。



我写了一个java类,它从xml文件中提取相关段,其工作原理如下:



String s = parseText(filename,position)



如果我写回字符串到文件,一切都看起来不错;问题是如果我这样做的话,

out.write(s)



在jsp页面,我得到奇怪的字符。顺便说一句,我使用



String s = getWordFromPostgresql(...)

$ b $在相同的jsp中,b

out.write(s)



确定。



任何提示?



感谢
Nicola






@ krosenvold



感谢您的回复,但是该指令已经在页面中,但它不起作用(实际上它工作,但仅适用于我从数据库)。我认为有一些关于从文件读取的东西,但是我无法理解...他们在java中工作但不是在jsp中(不能想到更好的解释...)



这是从实际代码中提取的一个基本示例:从文件中读取的方法返回Map,从Mark(表示文本中的位置的对象)返回到String(包含文本):



这是在.jsp页面(在上面的帖子中引用了utf-directive)

  // ... 
Map< Mark,String> map = TestoMarkParser.parseMarks(...);
out.write(map.get(m));

这是结果:



如果我把相同的代码放在一个java类中,那么你可以在代码中使用相同的代码,Fuper√≤cos√¨in Geno Enharmonico,che quelli quali vi si esercitavano,



< out.write with System.out.println,结果是这样的:



在perocosìin uso il Genere Enharmonico,che quelli quali vi si esercitavano p>




我已经用十六进制编辑器进行了一些分析,这里是:



原始字符串:fuperòcosì



ò在xml文件中:
C3 B2


$ b $在jsp文件中由out.write()呈现的b

ò:
E2 88 9A E2 89 A4



ò通过以下方式写入文件:

  FileWriter w = new FileWriter(new File(out.txt)); 
w.write(s); // s是解析的字符串
w.close();

C3 B2



打印每个字符为int

  0:70 = F 
1:117 = u
2:32 =
3:112 = p
4:101 = e
5:114 = r
6:8730 =
7:8804 =
8:32 =
9:99 = c
10:111 = o
11:115 = s
12:8730 =
13:168 =
14:10 =`


解决方案

page指令你应该尝试将你的内容类型设置为utf-8,这也将pageEncoding设置为utf-8。

 <%@ page contentType =text / html; charset = UTF-8%> 

UTF-8不是在jsp中的默认内容类型,还有由此产生的各种有趣的问题。问题是默认情况下,底层流被解释为ISO-8859-1流。如果您将一些unicode字节写入此流,则它们将被解释为ISO-8859-1。我发现将编码设置为utf-8是最好的解决方案。



编辑
此外,一个字符串变量在java中应该总是为unicode。所以你应该总是能够说

  System.out.println(myString)

,并查看您的Web服务器的控制台窗口中出现的正确字符集(或者只是停在调试器中并检查它)。我怀疑你这样做时会看到不正确的字符,这使我相信你在构造字符串时会遇到编码问题。


I'm having a hard time figuring out how to handle this problem:

I'm developing a web tool for an Italian university, and I have to display words with accents (such as è, ù, ...); sometimes I get these words from a PostgreSql table (UTF8-encoded), but mostly I have to read long passages from a file. These files are encoded as utf-8 xml, and display fine in Smultron or any utf-8 editor (they were created parsing in python old files with entities such as &egrave; instead of "è").

I wrote a java class which extracts the relevant segments from the xml file, which works like this:

String s = parseText(filename, position)

if I write the returned String to a file, everything looks fine; the problem is that if I do

out.write(s)

in the jsp page, I get strange characters. By the way, I use

String s = getWordFromPostgresql(...)

out.write(s)

in the very same jsp and it displays OK.

Any hint?

Thanks Nicola


@krosenvold

Thanks for your response, however that directive is already in the page, but it doesn't work (actually it "works" but only for the strings I get from the database). I think there's something about reading from the files, but I can't understand ... they work in "java" but not in "jsp" (can't think about a better explanation ...)

here's a basic example extracted from the actual code: the method to read from the files return a Map, from a Mark (an object representing a position in the text) to a String (containing the text):

this is in the .jsp page (with the utf-directive cited in the posts above)

    // ...
    Map<Mark, String> map = TestoMarkParser.parseMarks(...);
    out.write(map.get(m));

and this is the result:

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"

if I put the same code in a java class, and substitute out.write with System.out.println, the result is this:

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano,"


I've been doing some analysis with an hex editor, here it is:

original string: "fu però così "

ò in xml file: C3 B2

ò as rendered by out.write() in the jsp file: E2 88 9A E2 89 A4

ò as written to file via:

FileWriter w = new FileWriter(new File("out.txt"));
w.write(s);     // s is the parsed string
w.close();

C3 B2

printing the values of each character as an int

0: 70 = F
1: 117 = u
2: 32 =  
3: 112 = p
4: 101 = e
5: 114 = r
6: 8730 = � 
7: 8804 = � 
8: 32 =  
9: 99 = c
10: 111 = o
11: 115 = s
12: 8730 = �
13: 168 = �
14: 10 = `

解决方案

In the jsp page directive you should try setting your content-type to utf-8, which will set the pageEncoding to utf-8 also.

<%@page contentType="text/html;charset=UTF-8"%>

UTF-8 is not default content type in jsp, and there are all sorts of interesting problems that arise from this. The problem is that the underlying stream is interpreted as an ISO-8859-1 stream by default. If you write some unicode bytes to this stream, they will be interpreted as ISO-8859-1. I find that setting the encoding to utf-8 is the best solution.

Edit: Furthermore, a string variable in java should always be unicode. So you should always be able to say

System.out.println(myString) 

and see the proper character set coming in the console window of your web-server (or just stop in the debugger and examine it). I suspect that you'll be seeing incorrect characters when you do this, which leads me to believe you have an encoding problem when constructing the string.

这篇关于jsp utf编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆