如何将复合Unicode字符转换为预组合字符 [英] How to convert composite Unicode characters to precomposed characters

查看:161
本文介绍了如何将复合Unicode字符转换为预组合字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含复合Unicode字符的输入字符串,例如:

"Leppӓnen" == "\x004c\x0065\x0070\x0070\x04d3\x006e\x0065\x006e"

我想将其转换为使用预先组成的字符,即:

"Leppӓnen" == "\x004c\x0065\x0070\x0070\x00e4\x006e\x0065\x006e"

我试过了:

-String.Normalize()String.Normalize(NormalizationForm)
-kernel32.dll!WideCharToMultiByte(...)

我的最后一招是编写一种方法来手动查找这些字符的规范化版本并替换预先组成的字符,但是我希望有一个框架或Win32函数来执行此操作.

如果您不知道我在说什么,请参阅: http://en.wikipedia.org/wiki/Unicode_equivalence [^ ]
要查看我正在谈论的字符集,请参阅: http://en.wikibooks.org/Wiki/Unicode/Character_reference/0000-0FFF [ ^ ]

I have an input string that contains composite Unicode characters, like:

"Leppӓnen" == "\x004c\x0065\x0070\x0070\x04d3\x006e\x0065\x006e"

I want to convert this to use the precomposed characters, ie:

"Leppӓnen" == "\x004c\x0065\x0070\x0070\x00e4\x006e\x0065\x006e"

I have tried:

- String.Normalize() and String.Normalize(NormalizationForm)
- kernel32.dll!WideCharToMultiByte(...)

My last resort will be writing a method to manually look for the normalized versions of these characters and substitute the precomposed characters, but I was hoping there was a framework or Win32 function to do this.

If you have no idea what I''m talking about, see: http://en.wikipedia.org/wiki/Unicode_equivalence[^]
To see the character sets I''m talking about, see: http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF[^]

推荐答案

您可以从Windows非托管代码中尝试WideCharToMultiByte函数.参考网址: http://msdn.microsoft.com/en-us /library/dd374130%28v=vs.85%29.aspx [
You can try WideCharToMultiByte function from Windows unmanaged code. Reference at: http://msdn.microsoft.com/en-us/library/dd374130%28v=vs.85%29.aspx[^]


这篇关于如何将复合Unicode字符转换为预组合字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆