C#Encoding.Converting拉丁语希伯来语 [英] C# Encoding.Converting Latin to Hebrew

查看:269
本文介绍了C#Encoding.Converting拉丁语希伯来语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要获取和分析在线练成这是希伯来文写的,但不幸的是在非希伯来语编码文件。

作为一个例子,我尝试转换以下字符串:âìéåï_1,作为第一表名称使用C#code希伯来语,但我不能这样做。

我知道上面是可变的,当我在记事本+ +和选择编码/字符集打开它,因为/希伯来文/ Windows的1255,我可以看到:גליון_1这是上面的正确的希伯来文重presentation字符串。

我用下面的code

 字符串str =âìé​​åï_1;

            编码窗口= Encoding.GetEncoding(的Windows-1255);
            编码ASCII = Encoding.GetEncoding(视窗1252);
            byte []的asciiBytes = ascii.GetBytes(STR);
            byte []的windowsBytes = Encoding.Convert(ASCII,窗户,asciiBytes);

            的char [] windowsChars =新的char [windows.GetCharCount(windowsBytes,0,windowsBytes.Length)];
            windows.GetChars(windowsBytes,0,windowsBytes.Length,windowsChars,0);
            字符串windowsString =新的字符串(windowsChars);
 

我认为起源字符串编码为Windows-1252时,我将其粘贴到记事本,因为++和更改编码到Windows 1252的字符串保持不变...

我可能做错了什么在这里,任何人都知道如何正确地转换成以上?

谢谢

米奇

解决方案

 常量字符串str =âìé​​åï_1;

编码latinEncoding = Encoding.GetEncoding(视窗1252);
编码hebrewEncoding = Encoding.GetEncoding(视窗-1255);

byte []的latinBytes = latinEncoding.GetBytes(STR);

字符串hebrewString = hebrewEncoding.GetString(latinBytes);
 

hebrewString:

  

גליון_1

在您提供的示例窗口1252不是actualy ASCII,这是扩展ASCII,出于某种原因, Encoding.Convert 这两个编码不能转换扩展范围ASCII码,因此所有+127字符被转换为63(即?)。当从一个扩展ASCII字符的byte []转换到另一个,我期望的字节数是一样的,只有当你将其转换为净UNI code字符串我希望他们是不同的。不知道为什么转换正在转换+127字符为?。

I'm trying to fetch and parse an online excel document which is written in hebrew but unfortunately in a non-hebrew encoding.

As an example I'm trying to convert the following string: "âìéåï_1", which serves as the 1st sheet name to hebrew using C# code, but I'm unable to do so.

I know the above is convertible, since when I open it up in NotePad++ and select Encoding/Character Sets/Hebrew/Windows 1255, I can see: "גליון_1" which is the correct hebrew representation of the above string.

I'm using the below code

            string str = "âìéåï_1";

            Encoding windows = Encoding.GetEncoding("Windows-1255");
            Encoding ascii = Encoding.GetEncoding("Windows-1252");
            byte[] asciiBytes = ascii.GetBytes(str);
            byte[] windowsBytes = Encoding.Convert(ascii, windows, asciiBytes);

            char[] windowsChars = new char[windows.GetCharCount(windowsBytes, 0, windowsBytes.Length)];
            windows.GetChars(windowsBytes, 0, windowsBytes.Length, windowsChars, 0);
            string windowsString = new string(windowsChars);

I assumed that the encoding of the origin string is Windows-1252 since when I paste it in NotePad++ and change the encoding to Windows-1252 the string remains the same...

I'm probably doing something wrong here, anyone know how to convert the above correctly?

Thanks,

Mikey

解决方案

const string Str = "âìéåï_1";

Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding("Windows-1255");

byte[] latinBytes = latinEncoding.GetBytes(Str);

string hebrewString = hebrewEncoding.GetString(latinBytes);

hebrewString:

גליון_1

In your supplied example "Window-1252" is not actualy ASCII, it is extended ASCII, and for some reason Encoding.Convert with these two encodings cannot convert extended range ASCII, so all +127 characters are converted as 63 (i.e. ?). When "converting" from one extended ASCII character byte[] to another, I would expect the bytes to be the same, it is only when you convert them to a .Net unicode string I would expect them to be different. Not sure why Convert is converting +127 chars to '?'.

这篇关于C#Encoding.Converting拉丁语希伯来语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆