为什么当转换为Int32时字节的ASCII值不同? [英] Why are ASCII values of a byte different when cast as Int32?
问题描述
我正在创建一个将从文本文档中擦除扩展ASCII字符的程序。我试图理解C#如何解释不同的字符集和代码,并注意到一些怪异。
考虑:
命名空间ASCIITest
{
class Program
{
static void Main(string [] args)
{
string value =Slide™1½C4®;
byte [] asciiValue = Encoding.ASCII.GetBytes(value); // byte array
char [] array = value.ToCharArray(); // char array
Console.WriteLine(CHAR\tBYTE\tINT32);
for(int i = 0; i {
char letter = array [i];
byte byteValue = asciiValue [i];
Int32 int32Value = array [i];
//
Console.WriteLine({0} \t {1} \t {2},letter,byteValue,int32Value);
}
Console.ReadLine();
}
}
}
程序输出
CHAR BYTE INT32
S 83 83
l 108 108
i 105 105
d 100 100
e 101 101
T 63 8482< - 商标符号
1 49 49
½63 189< - fraction
63 8221< - smartquotes
C 67 67
4 52 52
r 63 174 < - 注册商标符号
特别是,我试图理解为什么当转换为 int32 $ c时,为什么扩展的ASCII字符(我的注释添加到第三列的右边)显示正确的值。 $ c>,但是当转换为
byte
时,它们都显示为 63
p>
ASCII.GetBytes
转换替换所有字符在带有问号(代码63)的ASCII范围(0-127)之外。
因为您的字符串包含该范围之外的字符,您的 asciiValue
有?
而不是所有有趣的符号,如
™
- 其 Char
(Unicode)repesentation是8482
将字符串转换为字符数组不会修改字符的值,并且仍然具有原始的Unicode代码( 以下是将该字符转换为字节/整数的可能方法: 详情请参阅 ASCIIEncoding类 ASCIIEncoding对应于Windows代码页20127.由于ASCII是7位编码,ASCII字符限制为最低128个Unicode字符,从U + 0000到U + 007F。如果使用Encoding.ASCII属性或ASCIIEncoding构造函数返回的默认编码器,则在执行编码操作之前,该范围之外的字符将被替换为问号(?)。 I'm in the process of creating a program that will scrub extended ASCII characters from text documents. I'm trying to understand how C# is interpreting the different character sets and codes, and am noticing some oddities. Consider: Output from program In particular, I'm trying to understand why the extended ASCII characters (the ones with my notes added to the right of the third column) show up with the correct value when cast as So since your string contains characters outside of that range your Converting string to char array does not modify values of characters and you still have original Unicode codes ( Below are possible conversion of that character into byte/integers: Details available at ASCIIEncoding Class ASCIIEncoding corresponds to the Windows code page 20127. Because ASCII is a 7-bit encoding, ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. If you use the default encoder returned by the Encoding.ASCII property or the ASCIIEncoding constructor, characters outside that range are replaced with a question mark (?) before the encoding operation is performed.
这篇关于为什么当转换为Int32时字节的ASCII值不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! char
基本上是
Int16
) - 将其转换为更长的整数类型 Int32
不会更改该值。 / p>
var value =™;
var ascii = Encoding.ASCII.GetBytes(value)[0]; // 63(`?`) - outside 0-127 range
var castToByte =(byte)(value [0]); // 34 = 8482%256
var Int16 =(Int16)value [0]; // 8482
var Int32 =(Int16)value [0]; // 8482
namespace ASCIITest
{
class Program
{
static void Main(string[] args)
{
string value = "Slide™1½"C4®";
byte[] asciiValue = Encoding.ASCII.GetBytes(value); // byte array
char[] array = value.ToCharArray(); // char array
Console.WriteLine("CHAR\tBYTE\tINT32");
for (int i = 0; i < array.Length; i++)
{
char letter = array[i];
byte byteValue = asciiValue[i];
Int32 int32Value = array[i];
//
Console.WriteLine("{0}\t{1}\t{2}", letter, byteValue, int32Value);
}
Console.ReadLine();
}
}
}
CHAR BYTE INT32
S 83 83
l 108 108
i 105 105
d 100 100
e 101 101
T 63 8482 <- trademark symbol
1 49 49
½ 63 189 <- fraction
" 63 8221 <- smartquotes
C 67 67
4 52 52
r 63 174 <- registered trademark symbol
int32
, but all show up as 63
when cast as the byte
value. What's going on here?ASCII.GetBytes
conversion replaces all characters outside of ASCII range (0-127) with question mark (code 63). asciiValue
have ?
instead of all interesting symbols like ™
- its Char
(Unicode) repesentation is 8482 which is indeed outside of 0-127 range.char
is essentially Int16
) - casting it to longer integer type Int32
does not change the value.var value = "™";
var ascii = Encoding.ASCII.GetBytes(value)[0]; // 63(`?`) - outside 0-127 range
var castToByte = (byte)(value[0]); // 34 = 8482 % 256
var Int16 = (Int16)value[0]; // 8482
var Int32 = (Int16)value[0]; // 8482