如何在 Java 中将 UTF-8 转换为 US-Ascii [英] How to convert UTF-8 to US-Ascii in Java

查看:48
本文介绍了如何在 Java 中将 UTF-8 转换为 US-Ascii的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个系统,客户,主要是欧洲的客户输入文本(UTF-8),必须分发到不同的系统,其中大多数接受 UTF-8,但现在我们还必须将文本分发到美国系统只接受 US-Ascii 7-bit

We have a system where customers, mainly European enter texts (in UTF-8) that has to be distributed to different systems, most of them accepting UTF-8, but now we must also distribute the texts to a US system which only accepts US-Ascii 7-bit

所以现在我们需要将所有欧洲字符转换为最近的 US-Ascii.是否有任何 Java 库可以帮助完成此任务?

So now we'll need to translate all European characters to the nearest US-Ascii. Is there any Java libraries to help with this task?

现在我们刚刚开始添加到一个翻译表,其中 Å(瑞典语 AA)->A 等等,如果我们找不到与输入字符匹配的任何字符,我们将记录它并替换为一个问号,并尝试在下一个版本中修复它,但它似乎非常低效,之前肯定有人做过类似的事情.

Right now we've just started adding to a translation table, where Å (swedish AA)->A and so on and where we don't find any match for an entered character, we'll log it and replace with a question mark and try and fix that for the next release, but it seems very inefficient and somebody else must have done something similair before.

推荐答案

uni2ascii 程序是用 C 编写,但您可以毫不费力地将其转换为 Java.它包含一个很大的近似表(隐式地,在 switch-case 语句中).

The uni2ascii program is written in C, but you could probably convert it to Java with little effort. It contains a large table of approximations (implicitly, in the switch-case statements).

请注意,没有普遍接受的近似值:德国人希望您用 AE 替换 Ä,芬兰人和瑞典人更喜欢 A.您的 Å 也不是很明显:瑞典人可能会放弃戒指并使用 A,但丹麦人和挪威人可能更喜欢历史上更正确的 AA.

Be aware that there are no universally accepted approximations: Germans want you to replace Ä by AE, Finns and Swedes prefer just A. Your example of Å isn't obvious either: Swedes would probably just drop the ring and use A, but Danes and Norwegians might like the historically more correct AA better.

这篇关于如何在 Java 中将 UTF-8 转换为 US-Ascii的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆