为什么要使用人类可读的文件格式? [英] Why should I use a human readable file format?

查看:156
本文介绍了为什么要使用人类可读的文件格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么要使用可读取的文件格式优先于二进制文件?有没有这样的情况呢?



编辑:
最初发布问题时,我确实有这样的解释,但不是所以现在相关:



当回答这个问题时,我想为了使用人类可读文件格式是一个好主意,将asker引用到标准的SO答案。然后我搜索了一个,找不到一个。所以这里是问题


解决方案

这取决于



正确的答案是否依赖。例如,如果您正在编写音频/视频数据,如果您将其撬动成人类可读取的格式,则不会非常易读!而文字是一个典型的例子,人们希望人们可以阅读,更加灵活,并且通过移动到XML MS。



比二进制或文本更重要的是标准或不是标准。如果你使用标准的格式,那么机会就是你,下一个人就不用写一个解析器了,这对每个人来说都是一个胜利。



如果您必须编写自己的格式(和解析器),您可能希望选择其中一个的理由的原因。



为什么使用人类可读?




  1. 下一个人。考虑维护开发人员从现在起30年或6个月后看你的代码。是的,他应该有源代码。是的,他应该有文件和评论。但他很可能不会。一直是那个家伙,不得不拯救或转换旧的,非常有价值的数据,我会感谢你做一些我可以看看和理解的东西。

  2. 让我阅读并用自己的工具写入。如果我是一个emacs用户,我可以使用它。或Vim,或记事本或...即使您已经创建了很棒的工具或库,它们可能不会在我的平台上运行,甚至可以运行。此外,我可以使用我的工具创建新的数据。

  3. 税收不是很大 - 存储空间。几乎总是光盘空间是免费的。如果不是你会知道的。不要担心几个尖括号或逗号,通常不会有太大差异。过早的优化是所有邪恶的根源。如果你真的很担心使用标准的压缩工具,然后你有一个很小的可读的格式 - 任何人都可以运行解压缩。

  4. 税不是那么大 - 电脑快速。解析二进制文件可能会更快。直到您需要添加额外的列或数据类型,或支持旧文件和新文件。 (虽然这可以通过协议缓冲区来缓解)

  5. 有很多好的格式在那里。即使你不喜欢XML。尝试CSV或JSON。或属性。甚至是XML。许多工具都存在,用于解析这些已经有很多语言的工具。如果神秘地所有的源代码丢失,只需要5分钟再写一次。

  6. Diffs变得容易。当您检入版本控制时,更容易看到发生了什么变化。并在网上查看。或您的iPhone。二进制,你知道有些事情发生了变化,但你依靠评论来告诉你什么。

  7. 合并变得简单。您仍然在网络上收到问题,询问如何将一个PDF附加到另一个。

  8. 如果损坏,更容易修复。尝试并修复损坏的文本文档与损坏的zip存档。

  9. 每种语言(和平台)都可以读取或写入。当然,二进制是计算机的母语,所以每个语言都将支持二进制。但是很多经典的小工具脚本语言使用文本数据更好地工作。我不能想到一种能够与二进制工作的语言,而不是文本(也可能是汇编程序)而不是其他方式。这意味着您的程序可以与您甚至没想到的其他程序进行交互,或者是在您之前30年写的。有一些原因Unix是成功的。



为什么不使用二进制代替?




  1. 你可能有很多数据 - 太字节。那么2因素真的很重要。但过早优化仍然是所有邪恶的根源。如何使用人类现在,并转换?这不会花费太多时间。

  2. 存储可能是免费的,但带宽不是(Jon Skeet在评论中)。如果你在网络上投掷文件,那么大小真的可以有所作为。甚至带宽往返光盘可能是一个限制因素。

  3. 真正的性能密集型代码。二进制可以认真优化。有一个原因数据库通常不具有自己的纯文本格式。

  4. 二进制格式可能是标准的。所以使用PNG,MP3或MPEG。这使得下一个家伙更容易(至少在未来10年)。

  5. 有很多很好的二进制格式在那里。一些是这种数据的全球标准。或者可能是硬件设备的标准。一些是标准的序列化框架。一个很好的例子是 Google协议缓冲区。另一个例子: Bencode

  6. 更容易嵌入二进制即可。一些数据已经是二进制的,你需要嵌入它。这可以自然地以二进制文件格式工作,但看起来很丑陋,并且在可读的文件格式中效率非常低,通常会阻止它们变得人性化。

  7. 故意隐瞒。有时你不希望你的数据显示出来。加密比通过晦涩的意外安全更好,但是如果您正在加密,那么您也可以使其成为二进制文件并完成。






  1. 更容易解析。人们声称文本和二进制文件都更容易解析。现在很清楚,最简单的解决方法是当您的语言或库支持解析时,这对于某些二进制和一些可读的格式来说是正确的,所以也不是真的支持。二进制格式可以被明确地选择,因此它们易于解析,但人类可读(也就是CSV或固定宽度),所以我认为这一点是无可奈何的。一些二进制格式可以被转储到内存并被原样使用,所以这可以说是最容易解析的,特别是如果数字(不仅仅是字符串),但是我认为大多数人会认为人类可读解析更容易调试,因为它更容易看到调试器中发生了什么(略)。

  2. 更容易控制。是的,更有可能有人会将文本数据在编辑器中,或者当一个Unicode格式工作时会呻吟,另一个Unicode格式不起​​作用,二进制数据不太可能,但是人和硬件仍然可以处理二进制数据,并且可以(并且应该)指定一个文本编码人性化的数据,无论是灵活还是固定的。

在一天结束时,我不认为也可以真正声称



还有其他



你确定你真的想要一个文件吗?你考虑过一个数据库吗? : - )



信用


$ b $这个答案很多都是合并在一起的其他人写的其他答案(你可以看到他们在那里)的东西。特别感谢Jon Skeet对他的评论(无论是在线还是离线),提出可以改进的方法。


Why should I use a human readable file format in preference to a binary one? Is there ever a situation when this isn't the case?

EDIT: I did have this as an explanation when initially posting the question, but it's not so relevant now:

When answering this question I wanted to refer the asker to a standard SO answer on why using a human readable file format is a good idea. Then I searched for one and couldn't find one. So here's the question

解决方案

It depends

The right answer is it depends. If you are writing audio/video data for instance, if you crowbar it into a human readable format, it won't be very readable! And word documents are the classic example where people have wished they were human readable, so more flexible, and by moving to XML MS are going that way.

Much more important than binary or text is a standard or not a standard. If you use a standard format, then chances are you and the next guy won't have to write a parser, and that's a win for everyone.

Following this are some opinionated reasons why you might want to choose one over the other, if you have to write your own format (and parser).

Why use human readable?

  1. The next guy. Consider the maintaining developer looking at your code 30 years or six months from now. Yes, he should have the source code. Yes he should have the documents and the comments. But he quite likely won't. And having been that guy, and had to rescue or convert old, extremely, valuable data, I'll thank you for for making it something I can just look at and understand.
  2. Let me read AND WRITE it with my own tools. If I'm an emacs user I can use that. Or Vim, or notepad or ... Even if you've created great tools or libraries, they might not run on my platform, or even run at all any more. Also, I can then create new data with my tools.
  3. The tax isn't that big - storage is free. Nearly always disc space is free. And if it isn't you'll know. Don't worry about a few angle brackets or commas, usually it won't make that much difference. Premature optimisation is the root of all evil. And if you are really worried just use a standard compression tool, and then you have a small human readable format - anyone can run unzip.
  4. The tax isn't that big - computers are quick. It might be a faster to parse binary. Until you need to add an extra column, or data type, or support both legacy and new files. (though this is mitigated with Protocol Buffers)
  5. There are a lot of good formats out there. Even if you don't like XML. Try CSV. Or JSON. Or .properties. Or even XML. Lots of tools exist for parsing these already in lots of languages. And it only takes 5mins to write them again if mysteriously all the source code gets lost.
  6. Diffs become easy. When you check in to version control it is much easier to see what has changed. And view it on the Web. Or your iPhone. Binary, you know something has changed, but you rely on the comments to tell you what.
  7. Merges become easy. You still get questions on the web asking how to append one PDF to another. This doesn't happen with Text.
  8. Easier to repair if corrupted. Try and repair a corrupt text document vs. a corrupt zip archive. Enough said.
  9. Every language (and platform) can read or write it. Of course, binary is the native language for computers, so every language will support binary too. But a lot of the classic little tool scripting languages work a lot better with text data. I can't think of a language that works well with binary and not with text (assembler maybe) but not the other way round. And that means your programs can interact with other programs you haven't even thought of, or that were written 30 years before yours. There are reasons Unix was successful.

Why not, and use binary instead?

  1. You might have a lot of data - terabytes maybe. And then a factor of 2 could really matter. But premature optimization is still the root of all evil. How about use a human one now, and convert later? It won't take much time.
  2. Storage might be free but bandwidth isn't (Jon Skeet in comments). If you are throwing files around the network then size can really make a difference. Even bandwidth to and from disc can be a limiting factor.
  3. Really performance intensive code. Binary can be seriously optimised. There is a reason databases don't normally have their own plain text format.
  4. A binary format might be the standard. So use PNG, MP3 or MPEG. It makes the next guys job easier (for at least the next 10 years).
  5. There are lots of good binary formats out there. Some are global standards for that type of data. Or might be a standard for hardware devices. Some are standard serialization frameworks. A great example is Google Protocol Buffers. Another example: Bencode
  6. Easier to embed binary. Some data already is binary and you need to embed it. This works naturally in binary file formats, but looks ugly and is very inefficient in human readable ones, and usually stops them being human readable.
  7. Deliberate obscurity. Sometimes you don't want it obvious what your data is doing. Encryption is better than accidental security through obscurity, but if you are encrypting you might as well make it binary and be done with it.

Debatable

  1. Easier to parse. People have claimed that both text and binary are easier to parse. Now clearly the easiest to parse is when your language or library supports parsing, and this is true for some binary and some human readable formats, so doesn't really support either. Binary formats can clearly be chosen so they are easy to parse, but so can human readable (think CSV or fixed width) so I think this point is moot. Some binary formats can just be dumped into memory and used as is, so this could be said to be the easiest to parse, especially if numbers (not just strings are involved. However I think most people would argue human readable parsing is easier to debug, as it is easier to see what is going on in the debugger (slightly).
  2. Easier to control. Yes, it is more likely someone will mangle text data in their editor, or will moan when one Unicode format works and another doesn't. With binary data that is less likely. However, people and hardware can still mangle binary data. And you can (and should) specify a text encoding for human-readable data, either flexible or fixed.

At the end of the day, I don't think either can really claim an advantage here.

Anything else

Are you sure you really want a file? Have you considered a database? :-)

Credits

A lot of this answer is merging together stuff other people wrote in other answers (you can see them there). And especially big thanks to Jon Skeet for his comments (both here and offline) for suggesting ways it could be improved.

这篇关于为什么要使用人类可读的文件格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆