十六进制字符串包括换行的运行长度编码 [英] Run length encoding of hexadecimal strings including newlines

查看:931
本文介绍了十六进制字符串包括换行的运行长度编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实现使用 GZipStream 类在C#WinForms应用程序运行长度编码。



提供

数据为一系列由换行符分隔的字符串,像这样的:

  FFFFFFFF 
FFFFFEFF
FDFFFFFF
00FFFFFF

压缩之前,我将字符串转换为字节数组,但这样做失败,如果换行字符都存在。



每个换行符是显著的,但我不知道如何保护自己在编码的位置。



下面是我使用要转换为字节数组的代码:

 私有静态的byte [] HexStringToByteArray(字符串_hex )
{
_hex = _hex.Replace(\r\\\
,);
如果(_hex.Length%2!= 0)抛出新FormatException(十六进制字符串长度必须是2整除);
INT L = _hex.Length / 2;
的byte [] B =新的字节[L];
表示(INT I = 0; I&所述; l;和我++)
B [I] = Convert.ToByte(_hex.Substring(ⅰ* 2,2),16);
回复B;
}



Convert.ToByte 如果换行不会被删除,随着信息抛出出现FormatException:其他非解析的字符是在字符串的结束。这并不让我感到吃惊。



什么是确保换行字符的最佳方式可以正常被列入?



注意我要补充一点,的压缩的这个字符串的版本本身必须是可以包含在XML文档中的字符串。



编辑:



我试图简单地将字符串转换为字节数组,而不对其执行任何二进制转换,但我仍然有压缩的麻烦。以下是相关的方法:

 私有静态的byte [] StringToByteArray(字符串_s)
{
编码ENC = Encoding.ASCII;
返回enc.GetBytes(_s);
}

公共静态的byte []压缩(字节[]缓冲区)
{
的MemoryStream毫秒=新的MemoryStream();
GZipStream邮编=新GZipStream(MS,CompressionMode.Compress,真正的);
zip.Write(缓冲液,0,buffer.Length);
zip.Close();
ms.Position = 0;

字节[] =压缩新的字节[ms.Length]
ms.Read(压缩,0,compressed.Length);

字节[] = gzBuffer新的字节[compressed.Length + 4];
Buffer.BlockCopy(压缩,0,gzBuffer,4,compressed.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length),0,gzBuffer,0,4);
返回gzBuffer;
}


解决方案

首先:你确定只是压缩文本并没有给太多相同的结果,压缩转换成二进制的形式?



假设你想转换为二进制先走,我可以建议两个选项:




  • 在每行的开始,写一个数字,说明有多少字节在线路。然后,当你解压缩,你阅读和转换很多字节,然后写一个换行符。如果你的知道的每一行总是会小于256字节长,你可以代表这是一个字节。否则,你可能需要一个更大的固定大小,还是有些大小可变编码(如而顶部位,这仍然是数字的一部分) - 后者得到毛茸茸很快

  • <。 LI>另外,由代表它作为(说)为0xFF,0×00越狱一个换行符。你会那么的的需要逃避一个真正的0xFF的作为(说)为0xFF 0xFF的。当读取数据,如果你读了0xFF的你再读取下一个字节,以确定它是否代表一个新行或一个真正0xFF的。


编辑:我相信你最初的方法是根本性的缺陷。无论你离开 GZipStream 是的的文本,不应该就好像它是用文字处理编码。但是,你可以把它的的ASCII文本很容易,通过调用 Convert.ToBase64String 。顺便说一句,你已经错过了另一个伎俩是打电话的ToArray 的MemoryStream ,它会给你的内容作为字节[] ,没有额外的插科打诨。


I am implementing run length encoding using the GZipStream class in a C# winforms app.

Data is provided as a series of strings separated by newline characters, like this:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

Before compressing, I convert the string to a byte array, but doing so fails if newline characters are present.

Each newline is significant, but I am not sure how to preserve their position in the encoding.

Here is the code I am using to convert to a byte array:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

Convert.ToByte throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.

What would be the best way to make sure newline characters can be included properly?

Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.

Edit:

I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

解决方案

Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?

Assuming you want to go ahead with converting to binary, I can suggest two options:

  • At the start of each line, write a number stating how many bytes are in the line. Then when you decompress, you read and convert that many bytes, then write a newline. If you know that each line is always going to be less than 256 bytes long, you can just represent this as a single byte. Otherwise you might want a larger fixed size, or some variable size encoding (e.g. "while the top bit is set, this is still part of the number") - the latter gets hairy pretty quickly.
  • Alternatively, "escape" a newline by representing it as (say) 0xFF, 0x00. You'd then also need to escape a genuine 0xFF as (say) 0xFF 0xFF. When you read the data, if you read an 0xFF you'd then read the next byte to determine whether it represented a newline or a genuine 0xFF.

EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of GZipStream is not text, and shouldn't be treated as if it were text using Encoding. However, you can turn it into ASCII text very easily, by calling Convert.ToBase64String. By the way, another trick you've missed is to call ToArray on the MemoryStream, which will give you the contents as a byte[] with no extra messing around.

这篇关于十六进制字符串包括换行的运行长度编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆