在从char到byte的转换中使用的编码 [英] Encoding used in cast from char to byte

查看:118
本文介绍了在从char到byte的转换中使用的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅以下C#代码(函数从 http:// wmsauth.org/examples\">中的 BuildProtectedURLWithValidity //wmsauth.org/examples ):

  byte [] StringToBytesToBeHashed(string to_be_hashed){
byte [] to_be_hashed_byte_array = new byte [to_be_hashed.Length];
int i = 0;
foreach(char cur_char in to_be_hashed)
{
to_be_hashed_byte_array [i ++] =(byte)cur_char;
}
return to_be_hashed_byte_array;
}

我的问题是:从字节到字符的转换条款编码?



我想它在编码方面没有什么作用,但这意味着Encoding.Default是一个使用的,所以返回的字节将取决于框架将如何编码特定操作系统中的底层字符串?



此外,字符实际上大于字节'










$ b

  Encoding.UTF8.GetBytes(stringToBeHashed)

解决方案

.NET Framework使用Unicode来表示其所有字符和字符串。 char的整数值(你可以通过转换到 int 获得)等价于它的UTF-16代码单元。对于基本多语言平面中的字符(构成您将遇到的大多数字符),此值是Unicode代码点。


.NET Framework使用 Char 结构表示Unicode字符。 Unicode标准使用称为代码点的唯一21位标量编号来标识每个Unicode字符,并定义UTF-16编码形式,用于指定如何将代码点编码为一个或多个16位值的序列。每个16位值范围从十六进制 0x0000 0xFFFF ,并存储在 Char 结构。 Char 对象的值是其16位数字(序数)值。 - Char结构

blockquote>

char 转换为字节将导致数据丢失尝试运行以下简单示例来了解原因:

  char c1 ='D'; //代码点68 
字节b1 =(字节)c1; // b1 is 68

char c2 ='ń'; //代码点324
字节b2 =(字节)c2; // b2 is 68 too!
// 324%256 == 68

是的,你一定要使用 Encoding.UTF8.GetBytes


Take a look at the following C# code (function extracted from the BuildProtectedURLWithValidity function in http://wmsauth.org/examples):

byte[] StringToBytesToBeHashed(string to_be_hashed) {
    byte[] to_be_hashed_byte_array = new byte[to_be_hashed.Length];
    int i = 0;
    foreach (char cur_char in to_be_hashed)
    {
        to_be_hashed_byte_array[i++] = (byte)cur_char;
    }
    return to_be_hashed_byte_array;
}

My question is: What the casting from byte to char does in terms of Encoding?

I guess it really does nothing in terms of Encoding, but does that mean that the Encoding.Default is the one which is used and so the byte to return will depend on how the framework will encode the underlying string in the specific Operative System?

And besides, is the char actually bigger than a byte (I'm guessing 2 bytes) and will actually omit the first byte?

I was thinking in replacing all this by:

Encoding.UTF8.GetBytes(stringToBeHashed)

What do you think?

解决方案

The .NET Framework uses Unicode to represent all its characters and strings. The integer value of a char (which you may obtain by casting to int) is equivalent to its UTF-16 code unit. For characters in the Basic Multilingual Plane (which constitute the majority of characters you'll ever encounter), this value is the Unicode code point.

The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value. — Char Structure

Casting a char to byte will result in data loss for any character whose value is larger than 255. Try running the following simple example to understand why:

char c1 = 'D';        // code point 68
byte b1 = (byte)c1;   // b1 is 68

char c2 = 'ń';        // code point 324
byte b2 = (byte)c2;   // b2 is 68 too!
                      // 324 % 256 == 68

Yes, you should definitely use Encoding.UTF8.GetBytes instead.

这篇关于在从char到byte的转换中使用的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆