将包含ASCII的字符串转换为Unicode [英] Convert a string containing ASCII to Unicode

查看：995 发布时间：2018/12/22 19:19:25 java unicode utf-8 servlets

本文介绍了将包含ASCII的字符串转换为Unicode的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从HTML页面获取一个字符串到我的Java HTTPServlet中。
根据我的要求，我得到显示中文字符的ASCII码：

&＃21487;&＃20197;&＃21578;&＃ 35785;&＃25105; （没有空格）

如何将此字符串转换为Unicode？

HTML code：

 < html> 
< head> 
< meta http-equiv =Content-Typecontent =text / html; charset = UTF-8/> 
< title>查找信息< / title> 
< link rel =stylesheettype =text / csshref =layout.css> 
< / head> 
< body> 
 
< form id =lookupformname =lookupformaction =LookupServletmethod =postaccept-charset =UTF-8> 
< table id =lookuptablealign =center> 
< tr> 
< label>问题：< / label> 
< td>< textarea cols =30rows =2name =lookupstringid =lookupstring>< / textarea>< / td> 
< / tr> 
< / table> 
< input type =submitname =查找id =lookupvalue =查找/> 
< / form>

Java代码：

  request.setCharacterEncoding（ UTF-8）; 
 javax.servlet.http.HttpSession session = request.getSession（）; 
 LoginResult lr =（LoginResult）session.getAttribute（loginResult）; 
 String [] question = request.getParameterValues（lookupstring）;

如果我打印问题[0]，那么我得到这个值：
&＃ 21487;&＃20197;&＃21578;&＃35785;&＃25105;

解决方案

没有显示中文字符的 ASCII 代码。 ASCII不代表中文字符。

如果您已经有一个Java字符串，它已经包含所有字符的内部表示（US，LATIN，CHINESE）。然后，您可以使用 UTF-8 将该Java字符串转换为Unicode a>或 UTF-16 陈述：

~~String s =可以告诉我;~~ （编辑：此行无法在没有字体的系统上正确显示中文字符）

 字符串s =\ u53ef\\\以 \ u544a \ u8bc9 \ u6211\" ; 
 byte utfString = s.getBytes（UTF-8）;

现在，我查看您的更新问题，您可能正在寻找 StringEscapeUtils 类。它来自Apache Commons Text。并且 unescape 将您的HTML实体转换为Java字符串：

  String s = StringEscapeUtils.unescapeHtml（ &＃21487;&＃20197;&＃21578;&＃35785;&＃25105;）; //没有空格

I get a string from my HTML page into my Java HTTPServlet. On my request I get ASCII codes that display Chinese characters:

"& #21487;& #20197;& #21578;& #35785;& #25105;" (without the spaces)

How can I transform this string into Unicode?

HTML code:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Find information</title>
    <link rel="stylesheet" type="text/css" href="layout.css">
</head>
<body>

<form id="lookupform" name="lookupform" action="LookupServlet" method="post" accept-charset="UTF-8">
    <table id="lookuptable" align="center">
        <tr>
            <label>Question:</label>
            <td><textarea cols="30" rows="2" name="lookupstring" id="lookupstring"></textarea></td>
        </tr>
    </table>
    <input type="submit" name="Look up" id="lookup" value="Look up"/>
</form>

Java code:

request.setCharacterEncoding("UTF-8");
javax.servlet.http.HttpSession session = request.getSession();
LoginResult lr = (LoginResult) session.getAttribute("loginResult");
String[] question = request.getParameterValues("lookupstring");

If I print question[0] then I get this value: "& #21487;& #20197;& #21578;& #35785;& #25105;"

解决方案

There is no such thing as ASCII codes that display Chinese characters. ASCII does not represent Chinese characters.

If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). You can then encode that Java string into Unicode using UTF-8 or UTF-16 representations:

~~String s = "可以告诉我";~~ (EDIT: This line won't display correctly on systems not having fonts for Chinese characters)

String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
byte utfString = s.getBytes("UTF-8");

Now that I look at your updated question, you might be looking for the StringEscapeUtils class. It's from Apache Commons Text. And will unescape your HTML entities into a Java string:

String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces

这篇关于将包含ASCII的字符串转换为Unicode的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将包含ASCII的字符串转换为Unicode [英] Convert a string containing ASCII to Unicode

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

将包含ASCII的字符串转换为Unicode [英] Convert a string containing ASCII to Unicode

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭