我们可以简化这个字符串编码代码 [英] Can we simplify this string encoding code

查看:88
本文介绍了我们可以简化这个字符串编码代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将此代码简化为更干净/更快的形式?

  StringBuilder builder = new StringBuilder(); 
var encoding = Encoding.GetEncoding(936);

//将文本转换为字节数组
byte [] source = Encoding.Unicode.GetBytes(text);

//将该字节数组转换为新的代码页。
byte [] converted = Encoding.Convert(Encoding.Unicode,encoding,source);

//获取多字节字符并将其编码为单独的ASCII字符
foreach(转换后的字节b)
builder.Append((char)b);

//返回结果
string result = builder.ToString();

简单地说,它需要一个带有汉字的字符串,例如郓并将它们转换为ài。 p>

例如,十进制中的中文字符是十六进制中的37126或0x9106。



请参阅 http://unicodelookup.com/# 0x9106 / 1



转换为字节数组,得到[145,6](145 * 256 + 6 = 37126)。当在CodePage 936(简体中文)中编码时,我们得到[224,105]。如果我们把这个字节数组分解成单个字符,在unicode中我们224 = e0 =à和105 = 69 = i。



请参阅 http://unicodelookup.com/# 0x00e0 / 1

http://unicodelookup.com/# 0x0069 / 1



因此,我们正在进行编码转换,并确保输出Unicode字符串中的所有字符都可以使用最多两个字节来表示。



更新:我需要此最终表示,因为这是我的收据打印机接受的格式。让我永远想出来! :)因为我不是一个编码专家,我寻找更简单或更快的代码,但输出必须保持不变。



更新(清洁版):

  return Encoding.GetEncoding(ISO-8859-1)。GetString(Encoding.GetEncoding(936).GetBytes(text)); 


解决方案

在调用 Encoding.Convert 之前将内置字符串表示转换为字节数组。



do:

  byte [] converted = Encoding.GetEncoding(936).GetBytes 

然后从该字节数组重构一个字符串,从而char值直接映射到字节,你可以do ...

  static string MangleTextForReceiptPrinter(string text){
return new string(
Encoding。 GetEncoding(936)
.GetBytes(text)
.Select(b =>(char)b)
.ToArray
}



我不用担心效率太高;您将在收据打印机上打印多少MB /秒?



Joe 指出,有一个直接将字节值0-255映射到代码点的编码,它是古老的 Latin1 ,它允许我们将函数缩短为...

  return Encoding.GetEncoding(Latin1)。GetString(
Encoding.GetEncoding(936).GetBytes(text)
);

顺便说一句,如果这是一个错误的只有Windows的API ),您可以 处理 codepage 1252 而是(几乎相同)。您可以尝试反射器,在发送之前查看您的System.String做了什么它通过电线。


Is it possible to simplify this code into a cleaner/faster form?

StringBuilder builder = new StringBuilder();
var encoding = Encoding.GetEncoding(936);

// convert the text into a byte array
byte[] source = Encoding.Unicode.GetBytes(text);

// convert that byte array to the new codepage. 
byte[] converted = Encoding.Convert(Encoding.Unicode, encoding, source);

// take multi-byte characters and encode them as separate ascii characters 
foreach (byte b in converted)
    builder.Append((char)b);

// return the result
string result = builder.ToString();

Simply put, it takes a string with Chinese characters such as 鄆 and converts them to ài.

For example, that Chinese character in decimal is 37126 or 0x9106 in hex.

See http://unicodelookup.com/#0x9106/1

Converted to a byte array, we get [145, 6] (145 * 256 + 6 = 37126). When encoded in CodePage 936 (simplified chinese), we get [224, 105]. If we break this byte array down into individual characters, we 224=e0=à and 105=69=i in unicode.

See http://unicodelookup.com/#0x00e0/1 and http://unicodelookup.com/#0x0069/1

Thus, we're doing an encoding conversion and ensuring that all characters in our output Unicode string can be represented using at most two bytes.

Update: I need this final representation because this is the format my receipt printer is accepting. Took me forever to figure it out! :) Since I'm not an encoding expert, I'm looking for simpler or faster code, but the output must remain the same.

Update (Cleaner version):

return Encoding.GetEncoding("ISO-8859-1").GetString(Encoding.GetEncoding(936).GetBytes(text));

解决方案

Well, for one, you don't need to convert the "built-in" string representation to a byte array before calling Encoding.Convert.

You could just do:

byte[] converted = Encoding.GetEncoding(936).GetBytes(text);

To then reconstruct a string from that byte array whereby the char values directly map to the bytes, you could do...

static string MangleTextForReceiptPrinter(string text) {
    return new string(
        Encoding.GetEncoding(936)
            .GetBytes(text)
            .Select(b => (char) b)
            .ToArray());
}

I wouldn't worry too much about efficiency; how many MB/sec are you going to print on a receipt printer anyhow?

Joe pointed out that there's an encoding that directly maps byte values 0-255 to code points, and it's age-old Latin1, which allows us to shorten the function to...

return Encoding.GetEncoding("Latin1").GetString(
           Encoding.GetEncoding(936).GetBytes(text)
       );

By the way, if this is a buggy windows-only API (which it is, by the looks of it), you might be dealing with codepage 1252 instead (which is almost identical). You might try reflector to see what it's doing with your System.String before it sends it over the wire.

这篇关于我们可以简化这个字符串编码代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆