将包含ASCII的字符串转换为Unicode [英] Convert a string containing ASCII to Unicode
问题描述
我从HTML页面获取一个字符串到我的Java HTTPServlet中。
根据我的要求,我得到显示中文字符的ASCII码:
可以告&# 35785;我 (没有空格)
如何将此字符串转换为Unicode? p>
HTML code:
< html>
< head>
< meta http-equiv =Content-Typecontent =text / html; charset = UTF-8/>
< title>查找信息< / title>
< link rel =stylesheettype =text / csshref =layout.css>
< / head>
< body>
< form id =lookupformname =lookupformaction =LookupServletmethod =postaccept-charset =UTF-8>
< table id =lookuptablealign =center>
< tr>
< label>问题:< / label>
< td>< textarea cols =30rows =2name =lookupstringid =lookupstring>< / textarea>< / td>
< / tr>
< / table>
< input type =submitname =查找id =lookupvalue =查找/>
< / form>
Java代码:
request.setCharacterEncoding( UTF-8);
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr =(LoginResult)session.getAttribute(loginResult);
String [] question = request.getParameterValues(lookupstring);
如果我打印问题[0],那么我得到这个值:
&# 21487;&#20197;&#21578;&#35785;&#25105;
没有显示中文字符的 ASCII
代码。 ASCII不代表中文字符。
如果您已经有一个Java字符串,它已经包含所有字符的内部表示(US,LATIN,CHINESE)。然后,您可以使用 UTF-8 将该Java字符串转换为Unicode a>或 UTF-16 陈述:
String s =可以告诉我; (编辑:此行无法在没有字体的系统上正确显示中文字符)
字符串s =\ u53ef\\\以 \ u544a \ u8bc9 \ u6211\" ;
byte utfString = s.getBytes(UTF-8);
现在,我查看您的更新问题,您可能正在寻找 StringEscapeUtils 类。它来自Apache Commons Text。并且 unescape 将您的HTML实体转换为Java字符串:
String s = StringEscapeUtils.unescapeHtml( &#21487;&#20197;&#21578;&#35785;&#25105;); //没有空格
I get a string from my HTML page into my Java HTTPServlet. On my request I get ASCII codes that display Chinese characters:
"& #21487;& #20197;& #21578;& #35785;& #25105;" (without the spaces)
How can I transform this string into Unicode?
HTML code:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Find information</title>
<link rel="stylesheet" type="text/css" href="layout.css">
</head>
<body>
<form id="lookupform" name="lookupform" action="LookupServlet" method="post" accept-charset="UTF-8">
<table id="lookuptable" align="center">
<tr>
<label>Question:</label>
<td><textarea cols="30" rows="2" name="lookupstring" id="lookupstring"></textarea></td>
</tr>
</table>
<input type="submit" name="Look up" id="lookup" value="Look up"/>
</form>
Java code:
request.setCharacterEncoding("UTF-8");
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr = (LoginResult) session.getAttribute("loginResult");
String[] question = request.getParameterValues("lookupstring");
If I print question[0] then I get this value: "& #21487;& #20197;& #21578;& #35785;& #25105;"
There is no such thing as ASCII
codes that display Chinese characters. ASCII does not represent Chinese characters.
If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). You can then encode that Java string into Unicode using UTF-8 or UTF-16 representations:
String s = "可以告诉我"; (EDIT: This line won't display correctly on systems not having fonts for Chinese characters)
String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
byte utfString = s.getBytes("UTF-8");
Now that I look at your updated question, you might be looking for the StringEscapeUtils class. It's from Apache Commons Text. And will unescape your HTML entities into a Java string:
String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces
这篇关于将包含ASCII的字符串转换为Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!