为什么相比于文本文件,二进制大而有所不同? [英] Why is binary file vary large compared to text?

查看:192
本文介绍了为什么相比于文本文件,二进制大而有所不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直保持在一个文本文件一大组数据作为文本记录的:

I have been keeping a large set of data as TEXT records in a TEXT file:

yyyyMMddTHHmmssfff doube1 double2

yyyyMMddTHHmmssfff doube1 double2

然而,当我读它,我需要分析每个日期时间。这是数以百万计的记录相当缓慢。

However when I read it I need to parse each DateTime. This is quite slow for millions of records.

所以,现在我想它作为我创建由serlializing我的类的二进制文件。

So, now I am trying it as a binary file which I created by serlializing my class.

这样,我并不需要解析的日期时间。

That way I do not need to parse the DateTime.

    class MyRecord 
    {
           DateTime DT;
           double Price1;
           double Price2;
    }

            public byte[] SerializeToByteArray()
            {
                var bf = new BinaryFormatter();
                using (var ms = new MemoryStream())
                {
                    bf.Serialize(ms, this);
                    return ms.ToArray();
                }
            }

    MyRecord mr = new MyRecord();

    outBin = new BinaryWriter(File.Create(binFileName, 2048, FileOptions.None));

   for (AllRecords) //Pseudo
    {
        mr = new MyRecord(); //Pseudo
        outBin.Write(mr.SerializeToByteArray());
    }

生成的二进制内容是文本文件的平均大小的3倍。

The resulting binary is on average 3 times the size of the TEXT file.

是可以预料的?

编辑1

我使用Protbuf帮助我探讨:

I am exploring using Protbuf to help me:

我想用使用,以适应现有的结构来做到这一点。

I want to do this with using USING to fit my existing structure.

   private void DisplayBtn_Click(object sender, EventArgs e)
    {
        string fileName = dbDirectory + @"\nAD20120101.dat";

        FileStream fs = File.OpenRead(fileName);

        MyRecord tr;
        while (fs.CanRead)
        {

            tr = Serializer.Deserialize<MyRecord>(fs);

            Console.WriteLine("> "+ tr.ToString());

        }

    }

但经过第一个记录TR - 全零的

BUT after first record tr - full of zeroes.

推荐答案

您存档很可能与每个记录相当大的开销序列化类型的信息。

Your archive likely has considerable overhead serializing type information with each record.

相反,使整个集合序列化(如果没有的话)和序列化,在一气呵成。

Instead, make the whole collection serializable (if it isn't already) and serialize that in one go.

这篇关于为什么相比于文本文件,二进制大而有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆