在C＃中操作unicode和ASCII字符集 [英] Manipulating both unicode and ASCII character set in C#

查看：107 发布时间：2017/8/16 22:06:41 c# string encoding

本文介绍了在C＃中操作unicode和ASCII字符集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

string [，] unicode2Ascii = { {& ;＃3001;，\x86} };

&＃3001 - 是tamil文字ஹ的unicode值。这是由MS Word作为字节序列保存的unicode值的原始十六进制文字。我正在尝试将这些unicode值strings映射到255以下的十六进制值（以适应非unicode支持的系统）。

我试图使用这样的string.replace：

  S = S.replace（unicode2Ascii [0,0]，unicode2Ascii [0,1]）;

然而，结果输出有一个？而不是实际的十六进制0x86存储。任何指针，我可以如何将该数组的第二个元素的编码设置为像windows-1252？

还是有更好的方式来进行这种转换？ / p>

提前感谢

解决方案

不知道这是否有帮助，但是泰米尔语代码页57004 - ISCII Tamil由Windows支持。

它不会为上面的示例字符提供相同的翻译。对于'ஹ'它给出了216.也许不同的代码页需要使用？

  string tamilUnicodeString =ஹ; 
 
编码编码= Encoding.GetEncoding（x-iscii-ta）; 
 
 byte [] codepageBytes = encoding.GetBytes（tamilUnicodeString）;

更新

如果您希望使用unicode文件作为输入，可以对字符进行音译以获得单字节表示，以下内容应该做到这一点。如果您的字典对每个字符进行编码，结果数组应该具有单字节表示形式：

 字典< char，char> lookup = new Dictionary< char，char> 
 {
 {'ஹ'，'\x86'}，
 {'இ'，'\x87'}，
 // next pair ...，
 //等等
}; 
 
 string input =ஹஇதில்உள்்தமிழ்எழுத்குக்கள்சரியாகத்தெரிந்தால்; 
 
 char [] chars = input.ToCharArray（）; （int i = 0; i< chars.Length; i ++）
 
 
 {
 char replaceChar; 
 
 if（lookup.TryGetValue（chars [i]，out replaceChar））
 {
 chars [i] = replaceChar; 
} 
} 
 
 byte [] output = Encoding.GetEncoding（iso-8859-1）GetBytes（chars）;

I have this mapping in my C# application

string [,] unicode2Ascii = { { "ஹ", "\x86" } };

ஹ - is the unicode value for a tamil literal "ஹ". This is the raw hex literal for the unicode value saved by MS Word as a byte sequence. I am trying to map these unicode value "strings" to a hex value under 255 (so as to accommodate non-unicode supported systems).

I trying to use string.replace like this:

S = S.replace(unicode2Ascii[0,0], unicode2Ascii[0,1]);

However the resultant ouput has a ? instead of the actual hex 0x86 stored. Any pointer on how I could set the encoding for the second element of that array to something like windows-1252?

Or is there a better way to do this conversion?

thanks in advance

解决方案

Not sure if this helps, but the Tamil codepage "57004 - ISCII Tamil" is supported by Windows.

It does not give the same translation for the example character above though. For 'ஹ' it gives 216. Perhaps a different codepage needs to be used?

        string tamilUnicodeString = "ஹ";

        Encoding encoding = Encoding.GetEncoding("x-iscii-ta");

        byte[] codepageBytes = encoding.GetBytes(tamilUnicodeString);

Update

If you wish to take a unicode file as input, transliterate characters to get a single byte representation, the following should do the trick. The resulting array should have your single byte representation if your dictionary encodes each character:

        Dictionary<char, char> lookup = new Dictionary<char, char>
        {
            { 'ஹ', '\x86' },
            { 'இ',  '\x87' },
            //next pair...,
            //etc, etc.
        };

        string input = "ஹஇதில் உள்ள தமிழ் எழுத்துக்கள் சரியாகத் தெரிந்தால்";

        char[] chars = input.ToCharArray();

        for (int i = 0; i < chars.Length; i++)
        {
            char replaceChar;

            if (lookup.TryGetValue(chars[i], out replaceChar))
            {
                chars[i] = replaceChar;
            }
        }

        byte[] output = Encoding.GetEncoding("iso-8859-1").GetBytes(chars);

这篇关于在C＃中操作unicode和ASCII字符集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在C＃中操作unicode和ASCII字符集 [英] Manipulating both unicode and ASCII character set in C#

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

在C＃中操作unicode和ASCII字符集 [英] Manipulating both unicode and ASCII character set in C#

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭