在C#中操作unicode和ASCII字符集 [英] Manipulating both unicode and ASCII character set in C#
问题描述
string [,] unicode2Ascii = {
{& ;#3001;,\x86}
};
&#3001 - 是tamil文字ஹ的unicode值。这是由MS Word作为字节序列保存的unicode值的原始十六进制文字。我正在尝试将这些unicode值strings映射到255以下的十六进制值(以适应非unicode支持的系统)。
我试图使用这样的string.replace:
S = S.replace(unicode2Ascii [0,0],unicode2Ascii [0,1]);
然而,结果输出有一个?而不是实际的十六进制0x86存储。任何指针,我可以如何将该数组的第二个元素的编码设置为像windows-1252?
还是有更好的方式来进行这种转换? / p>
提前感谢
不知道这是否有帮助,但是泰米尔语代码页57004 - ISCII Tamil由Windows支持。
它不会为上面的示例字符提供相同的翻译。对于'ஹ'它给出了216.也许不同的代码页需要使用?
string tamilUnicodeString =ஹ;
编码编码= Encoding.GetEncoding(x-iscii-ta);
byte [] codepageBytes = encoding.GetBytes(tamilUnicodeString);
更新
如果您希望使用unicode文件作为输入,可以对字符进行音译以获得单字节表示,以下内容应该做到这一点。如果您的字典对每个字符进行编码,结果数组应该具有单字节表示形式:
字典< char,char> lookup = new Dictionary< char,char>
{
{'ஹ','\x86'},
{'இ','\x87'},
// next pair ...,
//等等
};
string input =ஹஇதில்உள்்தமிழ்எழுத்குக்கள்சரியாகத்தெரிந்தால்;
char [] chars = input.ToCharArray(); (int i = 0; i< chars.Length; i ++)
{
char replaceChar;
if(lookup.TryGetValue(chars [i],out replaceChar))
{
chars [i] = replaceChar;
}
}
byte [] output = Encoding.GetEncoding(iso-8859-1)GetBytes(chars);
I have this mapping in my C# application
string [,] unicode2Ascii = {
{ "ஹ", "\x86" }
};
ஹ - is the unicode value for a tamil literal "ஹ". This is the raw hex literal for the unicode value saved by MS Word as a byte sequence. I am trying to map these unicode value "strings" to a hex value under 255 (so as to accommodate non-unicode supported systems).
I trying to use string.replace like this:
S = S.replace(unicode2Ascii[0,0], unicode2Ascii[0,1]);
However the resultant ouput has a ? instead of the actual hex 0x86 stored. Any pointer on how I could set the encoding for the second element of that array to something like windows-1252?
Or is there a better way to do this conversion?
thanks in advance
Not sure if this helps, but the Tamil codepage "57004 - ISCII Tamil" is supported by Windows.
It does not give the same translation for the example character above though. For 'ஹ' it gives 216. Perhaps a different codepage needs to be used?
string tamilUnicodeString = "ஹ";
Encoding encoding = Encoding.GetEncoding("x-iscii-ta");
byte[] codepageBytes = encoding.GetBytes(tamilUnicodeString);
Update
If you wish to take a unicode file as input, transliterate characters to get a single byte representation, the following should do the trick. The resulting array should have your single byte representation if your dictionary encodes each character:
Dictionary<char, char> lookup = new Dictionary<char, char>
{
{ 'ஹ', '\x86' },
{ 'இ', '\x87' },
//next pair...,
//etc, etc.
};
string input = "ஹஇதில் உள்ள தமிழ் எழுத்துக்கள் சரியாகத் தெரிந்தால்";
char[] chars = input.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
char replaceChar;
if (lookup.TryGetValue(chars[i], out replaceChar))
{
chars[i] = replaceChar;
}
}
byte[] output = Encoding.GetEncoding("iso-8859-1").GetBytes(chars);
这篇关于在C#中操作unicode和ASCII字符集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!