为什么当转换为Int32时字节的ASCII值不同? [英] Why are ASCII values of a byte different when cast as Int32?

查看:201
本文介绍了为什么当转换为Int32时字节的ASCII值不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个将从文本文档中擦除扩展ASCII字符的程序。我试图理解C#如何解释不同的字符集和代码,并注意到一些怪异。



考虑:

 命名空间ASCIITest 
{
class Program
{
static void Main(string [] args)
{
string value =Slide™1½C4®;
byte [] asciiValue = Encoding.ASCII.GetBytes(value); // byte array
char [] array = value.ToCharArray(); // char array
Console.WriteLine(CHAR\tBYTE\tINT32);
for(int i = 0; i {
char letter = array [i];
byte byteValue = asciiValue [i];
Int32 int32Value = array [i];
//
Console.WriteLine({0} \t {1} \t {2},letter,byteValue,int32Value);
}
Console.ReadLine();
}
}
}

程序输出

  CHAR BYTE INT32 
S 83 83
l 108 108
i 105 105
d 1​​00 100
e 101 101
T 63 8482< - 商标符号
1 49 49
½63 189< - fraction
63 8221< - smartquotes
C 67 67
4 52 52
r 63 174 < - 注册商标符号

特别是,我试图理解为什么当转换为 int32 ,但是当转换为 byte 时,它们都显示为 63 p>

解决方案

ASCII.GetBytes 转换替换所有字符在带有问号(代码63)的ASCII范围(0-127)之外。



因为您的字符串包含该范围之外的字符,您的 asciiValue 而不是所有有趣的符号,如 - 其 Char (Unicode)repesentation是8482



将字符串转换为字符数组不会修改字符的值,并且仍然具有原始的Unicode代码( char 基本上是 Int16 ) - 将其转换为更长的整数类型 Int32 不会更改该值。 / p>

以下是将该字符转换为字节/整数的可能方法:

  var value =™; 
var ascii = Encoding.ASCII.GetBytes(value)[0]; // 63(`?`) - outside 0-127 range
var castToByte =(byte)(value [0]); // 34 = 8482%256
var Int16 =(Int16)value [0]; // 8482
var Int32 =(Int16)value [0]; // 8482

详情请参阅 ASCIIEncoding类


ASCIIEncoding对应于Windows代码页20127.由于ASCII是7位编码,ASCII字符限制为最低128个Unicode字符,从U + 0000到U + 007F。如果使用Encoding.ASCII属性或ASCIIEncoding构造函数返回的默认编码器,则在执行编码操作之前,该范围之外的字符将被替换为问号(?)。



I'm in the process of creating a program that will scrub extended ASCII characters from text documents. I'm trying to understand how C# is interpreting the different character sets and codes, and am noticing some oddities.

Consider:

namespace ASCIITest
{
    class Program
    {
        static void Main(string[] args)
        {
            string value = "Slide™1½"C4®";
            byte[] asciiValue = Encoding.ASCII.GetBytes(value);   // byte array
            char[] array = value.ToCharArray();                   // char array
            Console.WriteLine("CHAR\tBYTE\tINT32"); 
            for (int i = 0; i < array.Length; i++)
            {
                char  letter     = array[i];
                byte  byteValue  = asciiValue[i];
                Int32 int32Value = array[i];
                 //
                Console.WriteLine("{0}\t{1}\t{2}", letter, byteValue, int32Value);
            }
            Console.ReadLine();
        }
    }
}

Output from program

CHAR    BYTE    INT32
S       83      83
l       108     108
i       105     105
d       100     100
e       101     101
T       63      8482      <- trademark symbol
1       49      49
½       63      189       <- fraction
"       63      8221      <- smartquotes
C       67      67
4       52      52
r       63      174       <- registered trademark symbol

In particular, I'm trying to understand why the extended ASCII characters (the ones with my notes added to the right of the third column) show up with the correct value when cast as int32, but all show up as 63 when cast as the byte value. What's going on here?

解决方案

ASCII.GetBytes conversion replaces all characters outside of ASCII range (0-127) with question mark (code 63).

So since your string contains characters outside of that range your asciiValue have ? instead of all interesting symbols like - its Char (Unicode) repesentation is 8482 which is indeed outside of 0-127 range.

Converting string to char array does not modify values of characters and you still have original Unicode codes (char is essentially Int16) - casting it to longer integer type Int32 does not change the value.

Below are possible conversion of that character into byte/integers:

var value = "™";
var ascii = Encoding.ASCII.GetBytes(value)[0]; // 63(`?`) - outside 0-127 range
var castToByte = (byte)(value[0]); // 34 = 8482 % 256
var Int16 = (Int16)value[0]; // 8482 
var Int32 = (Int16)value[0]; // 8482 

Details available at ASCIIEncoding Class

ASCIIEncoding corresponds to the Windows code page 20127. Because ASCII is a 7-bit encoding, ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. If you use the default encoder returned by the Encoding.ASCII property or the ASCIIEncoding constructor, characters outside that range are replaced with a question mark (?) before the encoding operation is performed.

这篇关于为什么当转换为Int32时字节的ASCII值不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆