将十六进制字符串解码为可读文本 [英] Decode a hex string into readable text

查看:118
本文介绍了将十六进制字符串解码为可读文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨大家好,

我的SQL Server上的列中有一串十六进制字符。

数据库中的每一行都是来自我的电子邮件的备份Exchange服务器。



一栏基本上是电子邮件的全部内容。

我需要将十六进制解码成可读字符,这样我就可以对电子邮件进行其他分析了。



我是通往实现解码部分。我已经发现十六进制字符串也是压缩的。我已经关闭了压缩,所以我只需要一次解决一件事。



我有一个简单的控制台应用程序,从标准输入和写入中获取字符串标准输出。



到目前为止一切顺利。当我关闭压缩时它会起作用。



Hi Guys,
I have a string of hex characters in a column on my SQL Server.
Each row in the database is a backup of an email from my Exchange server.

One column is basically the entire contents of an email.
I need to decoded the hex back into readable characters so I can perform other analysis on the email.

I'm part way to achieving the decode part. I have figured out that the hex string is also compressed. I have turned off the compression so I only have to solve one thing at a time.

I have a simple console application that takes the string from the standard input and writes to standard output.

So far so good. It works when I turn compression off.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace HexToText
{
    class Program
    {
        static void Main(string[] args)
        {
            TextReader textReader = Console.In;
            string input = textReader.ReadToEnd();
            string result = hexStringToString(input);
            Console.WriteLine(result);
        }

        static string hexStringToString(string hexString)
        {
            if (hexString == null || (hexString.Length & 1) == 1)
            {
                throw new ArgumentException();
            }
            var sb = new StringBuilder();
            for (var i = 0; i < hexString.Length; i += 2)
            {
                var hexChar = hexString.Substring(i, 2);
                if (hexChar != "0x")
                {
                    sb.Append((char)Convert.ToByte(hexChar, 16));
                }
            }
            return sb.ToString();
        }
    }
}





未压缩的十六进制字符串就像这样......

0x52656365697665643A2066726F6D20434F4D50414E595352562E63616D70736965< snip>

解码结果=收到:来自COMPANYSRV.campsie



压缩的十六进制字符串以这样的方式启动...(不是与上面相同的电子邮件,但它会非常相似)

0xCD396B8C23C959E599BB7DCCED240B977B26DCD47995DBDBBB69BBDBAFB17B6< snip>



我想这个我必须首先将其解码回二进制文件然后解压缩。要么我的小程序没有将它正确解码回二进制文件,要么我还没有猜到备份软件用来压缩文本的内容。



我使用Visual Studio 2008和SQL Server 2008 R2和备份软件是Policy Patrol。



数据库表还有一个名为ZIPCOMPRESSION的列,为1或0取决于打开或关闭压缩。



有没有人有任何想法我如何阅读压缩文本。



提前感谢任何建议。

David



我尝试过的事情:



我试图用7zip和Windows内置ZIP打开生成的文件。



The uncompressed hex string starts like this...
0x52656365697665643A2066726F6D20434F4D50414E595352562E63616D70736965<snip>
Decoded Result = Received: from COMPANYSRV.campsie

The compressed hex string starts like this... (not the same email as above but it is going to be very similar)
0xCD396B8C23C959E599BB7DCCED240B977B26DCD47995DBDBBB69BBDBAFB17B6<snip>

I figure that I must first decode it back into a binary file then decompress it. Either my little program doesn't decode it properly back into a binary file or I haven't yet guessed what the backup software has used to compress the text.

I'm Using Visual Studio 2008 and SQL Server 2008 R2 and the backup software is Policy Patrol.

The database table also has a column named "ZIPCOMPRESSION" and is 1 or 0 depending on wether compression is on or off.

Does anyone have any ideas how I can read the compressed text.

Thanks in advance for any suggestions.
David

What I have tried:

I have tried to open the resulting file with 7zip and Windows built in ZIP.

推荐答案

使用SQL要做到这一点: DECOMPRESS(Transact-SQL) [ ^ ] - 因为它压缩了它,解压缩它最安全的方法是让它去做...
Use SQL to do it: DECOMPRESS (Transact-SQL)[^] - since it compressed it, the safest way to decompress it is to get it to do it...


好的,

我我想我已经提出了一个解决方案。

我正在尝试解码的RFC822_msg列实际上是一个FileStream列。

也就是说它是一个varbinary启用了FileStream的(max)列,实际数据直接存储在database.mdb文件之外的NTFS文件系统中。



经过多次阅读最佳实践和从来没有这样做我已经在FileStream数据文件夹上启用了NTFS文件压缩,并在备份应用程序中关闭了压缩。



我知道这可能会有未知的性能影响,因为我已经将压缩/解压缩从一个地方移动到另一个地方,但谁知道它可能会变得更好:)整体性能并不是什么大问题,因为几乎没有与备份数据库的在线用户交互。我打算将报告作为计划任务进行,如果有人真的想找到并恢复存档的电子邮件,请猜猜他们会问谁!在任何情况下,用于在备份中查找消息的搜索路由都不会查看此压缩数据。



我的整体存储大小可能会有初步增长文件系统试图压缩已压缩的数据。



我无法报告历史数据,因为它仍然是压缩的,但这并不是什么大问题,因为用户有兴趣报告过去一周左右的趋势。



不,我不得不把我的钱放在我的嘴里并让它发生但是至少我现在有一个可行的计划。



感谢所有想过这个问题的人:)



我将这个项目作为答案标记为下一阶段。





David



更新

这个接近有效。唯一令人困惑的是我需要使用转换SQL子句将数据检索为文本,尽管它仍然是十六进制编码而不再是二进制。



OK,
I think I have come up with a workaround solution.
The RFC822_msg column that I'm trying to decode is actually a FileStream column.
That is to say it is a varbinary(max) column with FileStream enabled and the actual data is stored directly in the NTFS file system outside of the database.mdb file.

After much reading of best practices and "never do this's" I have enabled NTFS file compression on the FileStream data folder and turned off compression in the Backup application.

I know this could potentially have an unknown performance impact as I have moved the compression / decompression from one place to another but who knows it may get better :) Overall performance isn't a big deal as there is almost no online user interaction with the backup database. I plan to do the reporting as a scheduled task and if someone really wants to find and restore an archived email guess who they are going to ask! The search routing for locating messages in the backup doesn't look through this compressed data in any case.

There could be an initial growth in my overall storage size as the file system tried to compress already compressed data.

I can't report on historic data as it is still compressed but this is no big deal as the users are interested in reporting on trends over the last week or so.

No I have to put my money where my mouth is and make it happen but at least I have a workable plan now.

Thanks to all those who thought about this problem :)

I'll mark this as an answer once I get the project to the next stage.


David

Update
This approached worked. The only thing that was still confusing was that I needed to used a convert SQL clause to retrieve the data as text although it was still Hex encoded just no longer binary.

CONVERT(VARCHAR(MAX),dbo.PP4_MAILBACKUP.RFC822_msg, 1) AS RFC822_txt





然后,现在返回的数据只是十六进制编码的文本。然后我可以使用上面的hexToString代码将其转换为可读文本并执行我的分析。我希望有一种内置的方法可以在某个地方进行这种转换,但是一旦我找到一个有效的解决方案,我倾向于停止寻找。



我的数据大小没有不论磁盘使用率是否相同都会爆炸。



我希望这可以帮助别人。

David



Then the data that was returned now was simply hex encoded text. I could then use my hexToString code above to convert it into readable text and perform my analysis. I'd expect there is a built in way to do this conversion somewhere but I tend to stop looking once I find a solution that works :)

My data size didn't explode either with disk usage about the same.

I hope this helps someone else.
David

这篇关于将十六进制字符串解码为可读文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆