Java-从Unicode转换为ANSI [英] Java- Converting from unicode to ANSI

查看:576
本文介绍了Java-从Unicode转换为ANSI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF. 我需要将其转换为ANSI格式的Avwg wKsewš-iK_v ejwQ`.如何在Java中将此Unicode转换为ANSI字符.

I have a string \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF. I need to convert it in Avwg wKsewš—i K_v ejwQ` which is in ANSI format. How can I convert this Unicode to ANSI characters in java.

resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");               
resultView.setText(str);

推荐答案

我需要将其转换为ANSI格式的AvwgwKsewš—i K_v ejwQ.

这不是ANSI格式. Windows中的(错误命名)"ANSI"代码页均基于ASCII,高字节中添加了不同的字符.字节0x41(A)作为ANSI代码页中的前导字母始终表示拉丁语A而不是孟加拉语.

That's not ANSI format. The (misleadingly-named) "ANSI" code pages in Windows are all based around ASCII, with different characters added in the high bytes. Byte 0x41 (A) as a leading letter in an ANSI code page always means Latin A and not Bengali .

我认为您拥有的是自定义符号字体,该字体将任意符号映射到完全不相关的代码点.每一种这样的字体都有其自己的视觉编码.要在Unicode和自定义视觉编码之间进行转换,您必须通过查看每个字符的字形并将它们与代表相同字母的Unicode字符进行匹配来构建自己的转换表.

What I think you have is a custom symbol font, that maps arbitrary symbols to completely unrelated codepoints. Every such font has its own visual encoding; to convert between Unicode and the custom visual encoding you'd have to build up your own translation table by looking at the glyphs for each character and matching them to the Unicode character that represents the same letter.

我强烈建议您获取支持孟加拉语的,可识别Unicode的适当字体.卡在任意字体特定编码中的内容很难处理(因为从语义上讲,您实际上是在处理一个字符串,该字符串的意思是AvwgwKsewš-iK_v ejwQ",并且暗含着所有的编辑和更改大小写的陷阱.

I would strongly advise getting a proper Unicode-aware font that supports Bengali instead. Content stuck in an arbitrary font-specific encoding is difficult to deal with (because semantically you really are dealing with a string that means "AvwgwKsewš—i K_v ejwQ", with all the editing and case-changing gotchas that implies.

在Windows具有良好的Unicode(甚至ISCII)支持之前,视觉编码字体是令人不快的遗物.今天不应该将它们用于任何用途.

Visual-encoded fonts are an unhappy relic of the time before Windows had good Unicode (or even ISCII) support. They should not be used for anything today.

这篇关于Java-从Unicode转换为ANSI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆