为什么要使用一个人类可读的文件格式? [英] Why should I use a human readable file format?

查看:340
本文介绍了为什么要使用一个人类可读的文件格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为什么要使用preference人类可读的文件格式为二进制吗?是否有过的情况时,这是不是这样?

编辑:
我确实有这个当最初发布这个问题的解释,但现在不是那么相关的:

在回答这个问题我想提问者是指一个标准,为什么使用人类可读的文件,以便回答格式是一个好主意。然后,我搜索了一个,没能找到。因此,这里的问题


解决方案

这取决于

正确的答案是它依赖。如果你比如说写音频/视频数据,如果你撬棍它变成可读的格式,它不会是非常可读的!和Word文档都是经典的例子,人们希望他们是可读的,所以更加灵活,并通过移动到XML MS会这样。

或者比二进制或文本更重要的是一个标准或不标准。如果您使用的标准格式,那么很可能你和未来的家伙不会写一个解析器,这是每个人都受益。

在此之后有一些自以为是的原因,你可能想选择一个比其他,如果你必须写自己的格式(和解析器)。

为什么要用人类可读?


  1. 未来的家伙。考虑维持开发商看着你的code 30年从现在起六个月。是的,他应该有源$ C ​​$ C。是的,他应该有文件和意见。但他很可能不会。并已经那个家伙,而必须拯救或转换的时候,非常的,有价值的数据,我会感谢你为它做的东西我只能看和理解。<​​/ li>
  2. 让我读,用我自己的工具写。如果我是一个Emacs用户,我可以使用它。或者vim或者记事本或...即使你已经创建了伟大的工具或库,他们可能不是我的平台上运行,甚至在所有的任何更多的运行。另外,我可以再用我的工具创建新的数据。

  3. 税收并不大 - 存储是免费的。几乎总是磁盘空间是免费的。而如果它是不是你就知道了。不要担心几角括号或逗号,平时也不会相差太大。 premature优化是一切罪恶的根源。如果你真的担心只需使用一个标准的COM pression工具,然后你有一个小可读的格式 - 任何人都可以运行unzip

  4. 税收并不大 - 电脑很快。这可能是一个更快的解析二进制文件。直到你需要添加一个额外的列或数据类型,或者同时支持传统的和新的文件。 (虽然这与 Protocol Buffers的)

  5. 有很多很好的格式在那里。即使你不喜欢XML。尝试CSV。或JSON。或的.properties。甚至XML。工具很多在许多语言中已分析这些存在。而且只需要5分钟给他们写一遍,如果神秘的所有源$ C ​​$ C会丢失。

  6. 的diff变得简单。当您在版本控制检查这是很容易地看到发生了什么变化。并查看它在Web上。或者,你的iPhone。二,你知道的东西已经改变了,但你依靠的意见告诉你。

  7. 合并变得容易。你仍然可以在网络上的问题,询问如何将一个PDF追加到另一个。这不会发生的文本。

  8. 更容易修复损坏的,如果。尝试和修复损坏的文本文档与损坏的zip压缩包。不用多说了。

  9. 每一种语言(和平台)可以读取或写入它。当然,二是计算机的母语,所以每一个语言将支持二进制了。但是很多经典的小工具,脚本语言的工作与文本数据要好很多。我不认为二进制行之有效的语言,而不是使用文本(汇编也许),而不是倒过来。这意味着你的程序可以与你有没有想过,或者说写你30年前其它程序进行交互。我们有理由Unix的成功。

为什么不和使用二进制呢?


  1. 您可能有大量的数据 - TB的可能。然后2倍可能真正的问题。但是,premature优化仍然是所有罪恶的根源。如何使用是一个人了,后来转换?它不会花费太多时间。

  2. 存储可能是免费的,但带宽不是(乔恩斯基特在评论)。如果你扔在网络文件,然后大小真的可以有所作为。甚至带宽和从光盘可以是一个限制因素。

  3. 真性能密集型code 。二进制可严重优化。还有一个原因是数据库通常没有自己的纯文本格式。

  4. 二进制格式可能是标准即可。因此,使用PNG,MP3或MPEG。这使得接下来的家伙工作更容易(至少在未来10年)。

  5. 有很多很好的二进制格式在那里。有些是该类型数据的全球标准。或可能是硬件设备的标准。有些是标准序列化框架。一个很好的例子是谷歌协议缓冲器。又如:奔code

  6. 更容易嵌入二进制。有些数据已经是二进制的,你需要嵌入。这适用于自然二进制文件格式,但看起来丑陋,是人类可读的人非常低效,而且通常会停止他们是人类可读的。

  7. 故意隐匿。有时候,你不希望它明显的数据是做什么的。加密是不是通过默默无闻意外的安全更好,但如果你加密你还不如让它二进制和用它做。

值得商榷


  1. 容易解析。人们声称,文本和二进制更容易解析。现在显然最简单的解析是当你的语言和库支持解析,而这是一些二进制和一些人类可读的格式真,所以并不真正支持任何。二进制格式可以清晰地进行选择,使他们很容易分析,但所以人类可读的(想想CSV或固定的宽度),所以我觉得这一点是没有实际意义。一些二进制格式,可以直接倾入内存中,并作为是,所以这可以说是最简单的分析,尤其是数字(不只是字符串都参与其中。不过,我想大多数人会认为人类可读的解析是比较容易调试,因为它是更容易地看到什么是在调试器(略)怎么回事。

  2. 容易控制。是的,它更可能会有人裂伤文本数据在他们的编辑,或者当一个统一code画幅作品和另一个不将呻吟。二进制数据是不太可能。然而,人们和硬件还是可以裂伤二进制数据。你可以(也应该)指定人类可读的数据的文本编码,柔性或固定的。

在这一天结束时,我不认为任何真的可以要求利益在这里。

还有什么

你确定你真的想要一个文件?你有没有考虑一个数据库? : - )

积分

很多这样的答案是融合在一起的东西其他人在其他的答案中写道(你可以看到他们那里)。尤其是大感谢乔恩斯基特对他的评论(在这里和离线)的建议方面,它可以改进。

Why should I use a human readable file format in preference to a binary one? Is there ever a situation when this isn't the case?

EDIT: I did have this as an explanation when initially posting the question, but it's not so relevant now:

When answering this question I wanted to refer the asker to a standard SO answer on why using a human readable file format is a good idea. Then I searched for one and couldn't find one. So here's the question

解决方案

It depends

The right answer is it depends. If you are writing audio/video data for instance, if you crowbar it into a human readable format, it won't be very readable! And word documents are the classic example where people have wished they were human readable, so more flexible, and by moving to XML MS are going that way.

Much more important than binary or text is a standard or not a standard. If you use a standard format, then chances are you and the next guy won't have to write a parser, and that's a win for everyone.

Following this are some opinionated reasons why you might want to choose one over the other, if you have to write your own format (and parser).

Why use human readable?

  1. The next guy. Consider the maintaining developer looking at your code 30 years or six months from now. Yes, he should have the source code. Yes he should have the documents and the comments. But he quite likely won't. And having been that guy, and had to rescue or convert old, extremely, valuable data, I'll thank you for for making it something I can just look at and understand.
  2. Let me read AND WRITE it with my own tools. If I'm an emacs user I can use that. Or Vim, or notepad or ... Even if you've created great tools or libraries, they might not run on my platform, or even run at all any more. Also, I can then create new data with my tools.
  3. The tax isn't that big - storage is free. Nearly always disc space is free. And if it isn't you'll know. Don't worry about a few angle brackets or commas, usually it won't make that much difference. Premature optimisation is the root of all evil. And if you are really worried just use a standard compression tool, and then you have a small human readable format - anyone can run unzip.
  4. The tax isn't that big - computers are quick. It might be a faster to parse binary. Until you need to add an extra column, or data type, or support both legacy and new files. (though this is mitigated with Protocol Buffers)
  5. There are a lot of good formats out there. Even if you don't like XML. Try CSV. Or JSON. Or .properties. Or even XML. Lots of tools exist for parsing these already in lots of languages. And it only takes 5mins to write them again if mysteriously all the source code gets lost.
  6. Diffs become easy. When you check in to version control it is much easier to see what has changed. And view it on the Web. Or your iPhone. Binary, you know something has changed, but you rely on the comments to tell you what.
  7. Merges become easy. You still get questions on the web asking how to append one PDF to another. This doesn't happen with Text.
  8. Easier to repair if corrupted. Try and repair a corrupt text document vs. a corrupt zip archive. Enough said.
  9. Every language (and platform) can read or write it. Of course, binary is the native language for computers, so every language will support binary too. But a lot of the classic little tool scripting languages work a lot better with text data. I can't think of a language that works well with binary and not with text (assembler maybe) but not the other way round. And that means your programs can interact with other programs you haven't even thought of, or that were written 30 years before yours. There are reasons Unix was successful.

Why not, and use binary instead?

  1. You might have a lot of data - terabytes maybe. And then a factor of 2 could really matter. But premature optimization is still the root of all evil. How about use a human one now, and convert later? It won't take much time.
  2. Storage might be free but bandwidth isn't (Jon Skeet in comments). If you are throwing files around the network then size can really make a difference. Even bandwidth to and from disc can be a limiting factor.
  3. Really performance intensive code. Binary can be seriously optimised. There is a reason databases don't normally have their own plain text format.
  4. A binary format might be the standard. So use PNG, MP3 or MPEG. It makes the next guys job easier (for at least the next 10 years).
  5. There are lots of good binary formats out there. Some are global standards for that type of data. Or might be a standard for hardware devices. Some are standard serialization frameworks. A great example is Google Protocol Buffers. Another example: Bencode
  6. Easier to embed binary. Some data already is binary and you need to embed it. This works naturally in binary file formats, but looks ugly and is very inefficient in human readable ones, and usually stops them being human readable.
  7. Deliberate obscurity. Sometimes you don't want it obvious what your data is doing. Encryption is better than accidental security through obscurity, but if you are encrypting you might as well make it binary and be done with it.

Debatable

  1. Easier to parse. People have claimed that both text and binary are easier to parse. Now clearly the easiest to parse is when your language or library supports parsing, and this is true for some binary and some human readable formats, so doesn't really support either. Binary formats can clearly be chosen so they are easy to parse, but so can human readable (think CSV or fixed width) so I think this point is moot. Some binary formats can just be dumped into memory and used as is, so this could be said to be the easiest to parse, especially if numbers (not just strings are involved. However I think most people would argue human readable parsing is easier to debug, as it is easier to see what is going on in the debugger (slightly).
  2. Easier to control. Yes, it is more likely someone will mangle text data in their editor, or will moan when one Unicode format works and another doesn't. With binary data that is less likely. However, people and hardware can still mangle binary data. And you can (and should) specify a text encoding for human-readable data, either flexible or fixed.

At the end of the day, I don't think either can really claim an advantage here.

Anything else

Are you sure you really want a file? Have you considered a database? :-)

Credits

A lot of this answer is merging together stuff other people wrote in other answers (you can see them there). And especially big thanks to Jon Skeet for his comments (both here and offline) for suggesting ways it could be improved.

这篇关于为什么要使用一个人类可读的文件格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆