在C#中操作unicode和ASCII字符集 [英] Manipulating both unicode and ASCII character set in C#

查看:107
本文介绍了在C#中操作unicode和ASCII字符集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




string [,] unicode2Ascii = {
{& ;#3001;,\x86}
};



&#3001 - 是tamil文字ஹ的unicode值。这是由MS Word作为字节序列保存的unicode值的原始十六进制文字。我正在尝试将这些unicode值strings映射到255以下的十六进制值(以适应非unicode支持的系统)。



我试图使用这样的string.replace:

  S = S.replace(unicode2Ascii [0,0],unicode2Ascii [0,1]); 

然而,结果输出有一个?而不是实际的十六进制0x86存储。任何指针,我可以如何将该数组的第二个元素的编码设置为像windows-1252?



还是有更好的方式来进行这种转换? / p>

提前感谢

解决方案

不知道这是否有帮助,但是泰米尔语代码页57004 - ISCII Tamil由Windows支持。



它不会为上面的示例字符提供相同的翻译。对于'ஹ'它给出了216.也许不同的代码页需要使用?

  string tamilUnicodeString =ஹ; 

编码编码= Encoding.GetEncoding(x-iscii-ta);

byte [] codepageBytes = encoding.GetBytes(tamilUnicodeString);

更新



如果您希望使用unicode文件作为输入,可以对字符进行音译以获得单字节表示,以下内容应该做到这一点。如果您的字典对每个字符进行编码,结果数组应该具有单字节表示形式:

 字典< char,char> lookup = new Dictionary< char,char> 
{
{'ஹ','\x86'},
{'இ','\x87'},
// next pair ...,
//等等
};

string input =ஹஇதில்உள்்தமிழ்எழுத்குக்கள்சரியாகத்தெரிந்தால்;

char [] chars = input.ToCharArray(); (int i = 0; i< chars.Length; i ++)


{
char replaceChar;

if(lookup.TryGetValue(chars [i],out replaceChar))
{
chars [i] = replaceChar;
}
}

byte [] output = Encoding.GetEncoding(iso-8859-1)GetBytes(chars);


I have this mapping in my C# application

string [,] unicode2Ascii = { { "&#3001;", "\x86" } };

ஹ - is the unicode value for a tamil literal "ஹ". This is the raw hex literal for the unicode value saved by MS Word as a byte sequence. I am trying to map these unicode value "strings" to a hex value under 255 (so as to accommodate non-unicode supported systems).

I trying to use string.replace like this:

S = S.replace(unicode2Ascii[0,0], unicode2Ascii[0,1]);

However the resultant ouput has a ? instead of the actual hex 0x86 stored. Any pointer on how I could set the encoding for the second element of that array to something like windows-1252?

Or is there a better way to do this conversion?

thanks in advance

解决方案

Not sure if this helps, but the Tamil codepage "57004 - ISCII Tamil" is supported by Windows.

It does not give the same translation for the example character above though. For 'ஹ' it gives 216. Perhaps a different codepage needs to be used?

        string tamilUnicodeString = "ஹ";

        Encoding encoding = Encoding.GetEncoding("x-iscii-ta");

        byte[] codepageBytes = encoding.GetBytes(tamilUnicodeString);

Update

If you wish to take a unicode file as input, transliterate characters to get a single byte representation, the following should do the trick. The resulting array should have your single byte representation if your dictionary encodes each character:

        Dictionary<char, char> lookup = new Dictionary<char, char>
        {
            { 'ஹ', '\x86' },
            { 'இ',  '\x87' },
            //next pair...,
            //etc, etc.
        };

        string input = "ஹஇதில் உள்ள தமிழ் எழுத்துக்கள் சரியாகத் தெரிந்தால்";

        char[] chars = input.ToCharArray();

        for (int i = 0; i < chars.Length; i++)
        {
            char replaceChar;

            if (lookup.TryGetValue(chars[i], out replaceChar))
            {
                chars[i] = replaceChar;
            }
        }

        byte[] output = Encoding.GetEncoding("iso-8859-1").GetBytes(chars);

这篇关于在C#中操作unicode和ASCII字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆