如何读取字节的字符串混合文件 [英] How to read mixed file of byte and string

查看:154
本文介绍了如何读取字节的字符串混合文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个混合的文件有很多串线和字节连接codeD数据的一部分。 例如:

I've a mixed file with a lot of string line and part of byte encoded data. Example:

--Begin Attach
Content-Info: /Format=TIF
Content-Description: 30085949.tif (TIF File)
Content-Transfer-Encoding: binary; Length=220096
II*II* Îh  ÿÿÿÿÿÿü³küìpsMg›Êq™Æ™Ôd™‡–h7ÃAøAú áùõ=6?Eã½/ô|û ƒú7z:>„Çÿý<þ¯úýúßj?å¿þÇéöûþ"«ÿ¾ÁøKøÈ%ŠdOÿÞÈ<,Wþ‡ÿ·ƒïüúCÿß%Ï$sŸÿÃÿ÷‡þåiò>GÈù#ä|‘ò:#ä|Š":#¢:;ˆèŽˆèʤV‘ÑÑÑÑÑÑÑÑÑçIþ×o(¿zHDDDDDFp'.Ñ:ˆR:aAràÁ¬LˆÈù!ÿÿï[ÿ¯Äàiƒ"VƒDÇ)Ê6PáÈê$9C"9C†‡CD¡pE@¦œÖ{i~Úý¯kköDœ4ÉU"8`ƒt!l2G
--End Attach--

我尝试读取文件的StreamReader:

i try to read file with streamreader:

string[] lines = System.IO.File.ReadAllLines(@"C:\Users\Davide\Desktop\20041230000D.xmm")

我一行一行读文件,当行等于内容传输编码:二进制;长度= 220096,我读的所有行,并写了一个文件名(在这种情况下30085949.tif)文件。 但是,我读的字符串,而不是字节的数据和结果文件被破坏(我现在尝试用TIFF文件)。任何建议我?

I read line by line the file, and when line is equal "Content-Transfer-Encoding: binary; Length=220096", i read all following lines and write a "filename"(in this case 30085949.tif) file. But i'm reading strings, not byte data and result file is damaged (now i try with tiff file). Any suggestion for me?

解决方案: 感谢您的答复。我已经采用了这种解决方案:我建造了一座LineReader BinaryReader在延伸:

SOLUTION Thanks for reply. I've adopted this solution: I builded a LineReader extend BinaryReader:

 public class LineReader : BinaryReader
    {
        public LineReader(Stream stream, Encoding encoding)
            : base(stream, encoding)
        {

        }

        public int currentPos;
        private StringBuilder stringBuffer;

        public string ReadLine()
        {
            currentPos = 0;

            char[] buf = new char[1];

            stringBuffer = new StringBuilder();
            bool lineEndFound = false;

            while (base.Read(buf, 0, 1) > 0)
            {
                currentPos++;
                if (buf[0] == Microsoft.VisualBasic.Strings.ChrW(10))
                {
                    lineEndFound = true;
                }
                else
                {                   
                    stringBuffer.Append(buf[0]);                    
                }
                if (lineEndFound)
                {
                    return stringBuffer.ToString();
                }

            }
            return stringBuffer.ToString();

        }

    }

其中的 Microsoft.VisualBasic.Strings.ChrW(10)的是换行。 当我分析我的文件:

Where Microsoft.VisualBasic.Strings.ChrW(10) is a Line Feed. When i parse my file:

    using (LineReader b = new LineReader(File.OpenRead(path), Encoding.Default))
    {
        int pos = 0;
        int length = (int)b.BaseStream.Length;
        while (pos < length)
        {
            string line = b.ReadLine();
            pos += (b.currentPos);

            if (!beginNextPart)
            {
                if (line.StartsWith(BEGINATTACH))
                {
                    beginNextPart = true;

                }
            }
            else
            {
                if (line.StartsWith(ENDATTACH))
                {
                    beginNextPart = false;
                }
                else
                {
                    if (line.StartsWith("Content-Transfer-Encoding: binary; Length="))
                    {
                        attachLength = Convert.ToInt32(line.Replace("Content-Transfer-Encoding: binary; Length=", ""));
                        byte[] attachData = b.ReadBytes(attachLength);
                        pos += (attachLength);
                        ByteArrayToFile(@"C:\users\davide\desktop\files.tif", attachData);
                    }
                }
            }
        }
    }

我从文件中读取的字节长度和我读到了下面的n个字节。

I read a byte length from file and i read following n bytes.

推荐答案

在这里你的问题是,一个StreamReader假定它是唯一读取文件,并因此它提前读取。最好的办法是阅读文件为二进制,并使用相应的文本编码检索字符串数据从你自己的缓冲区。

Your problem here is that a StreamReader assumes that it is the only thing reading the file, and as a result it reads ahead. Your best bet is to read the file as binary and use the appropriate text encoding to retrieve the string data out of your own buffer.

由于显然你不介意整个文件读入内存,你可以用一个启动:

Since apparently you don't mind reading the entire file into memory, you can start with a:

byte[] buf = System.IO.File.ReadAllBytes(@"C:\Users\Davide\Desktop\20041230000D.xmm");

然后你使用UTF-8为您的文本数据的假设:

Then assuming you're using UTF-8 for your text data:

int offset = 0;
int binaryLength = 0;
while (binaryLength == 0 && offset < buf.Length) {
    var eolIdx = Array.IndexOf(offset, 13); // In a UTF-8 stream, byte 13 always represents newline
    string line = System.Text.Encoding.UTF8.GetString(buf, offset, eolIdx - offset - 1);

    // Process your line appropriately here, and set binaryLength if you expect binary data to follow

    offset = eolIdx + 1;
}

// You don't necessarily need to copy binary data out, but just to show where it is:
var binary = new byte[binaryLength];
Buffer.BlockCopy(buf, offset, binary, 0, binaryLength);

您可能还希望做一个 line.TrimEnd('\ r'),如果你希望窗口风格的行结束。

You might also want to do a line.TrimEnd('\r'), if you expect Window-style line endings.

这篇关于如何读取字节的字符串混合文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆