恢复损坏的zip或gzip文件? [英] Recover corrupt zip or gzip files?

查看:475
本文介绍了恢复损坏的zip或gzip文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

破坏压缩文件的最常见方法是无意间执行ASCII模式的FTP传输,这会导致CR和/或LF字符多对一的破坏.

The most common method for corrupting compressed files is to inadvertently do an ASCII-mode FTP transfer, which causes a many-to-one trashing of CR and/or LF characters.

很明显,这会造成信息丢失,而解决此问题的最佳方法是再次以FTP二进制模式进行传输.

Obviously, there is information loss, and the best way to fix this problem is to transfer again, in FTP binary mode.

但是,如果原始文件丢失了,并且很重要,那么数据的可恢复性如何?

However, if the original is lost, and it's important, how recoverable is the data?

[实际上,我已经知道我认为是最佳答案(这很困难,但有时可能会出现-我稍后再发布),以及常见的非答案(很多现成的用于修复CRC的程序)而不修复数据),但是我认为在stackoverflow beta期间尝试这个问题,看看是否还有其他人走过成功恢复道路或发现了我不知道的工具,这很有意思.]

[Actually, I already know what I think is the best answer (it's very difficult but sometimes possible - I'll post more later), and the common non-answers (lots of off-the-shelf programs for repairing CRCs without repairing data), but I thought it would be interesting to try out this question during the stackoverflow beta period, and see if anyone else has gone down the successful-recovery path or discovered tools I don't know about.]

推荐答案

来自

大约256字节中的1个是已知的 被破坏,而腐败是 已知仅以字节为单位出现 值'\ 012'.所以字节错误率 是1/256(输入的0.39%)和2/256 字节(占输入的0.78%)是可疑的. 但是由于每次被砸只有三位 字节受影响,误码率 仅为3/(256 * 8):0.15%不好,0.29% 怀疑.

Approximately 1 in 256 bytes is known to be corrupted, and the corruption is known to occur only in bytes with the value '\012'. So the byte error rate is 1/256 (0.39% of input), and 2/256 bytes (0.78% of input) are suspect. But since only three bits per smashed byte are affected, the bit error rate is only 3/(256*8): 0.15% is bad, 0.29% is suspect.

...

压缩输入中的错误 破坏了减压过程 所有后续字节...事实 解压后的输出是 这么快就可以认出是不好的原因 希望-寻找正确的 答案可以识别错误的答案 很快.

An error in the compressed input disrupts the decompression process for all subsequent bytes...The fact that the decompressed output is recognizably bad so quickly is cause for hope -- a search for the correct answer can identify wrong answers quickly.

最终,有几种技术 结合成功提取 这些文件中的合理数据:

Ultimately, several techniques were combined to successfully extract reasonable data from these files:

  • 域和引用的字符串的特定于域的解析
  • 从以前的数据中进行机器学习,损害的可能性很小
  • 由于其他原因造成文件损坏的容忍度(例如,磁盘在 记录)
  • 先行一步,指导沿着最高概率的路径进行搜索
  • Domain-specific parsing of fields and quoted strings
  • Machine learning from previous data with low probability of damage
  • Tolerance for file damage due to other causes (e.g. disk full while logging)
  • Lookahead for guiding the search along the highest-probability paths

这些技术可识别75%的 肯定地进行必要的维修,以及 其余的被探索 最高概率优先,因此 合理的重建是 立即确定.

These techniques identify 75% of the necessary repairs with certainty, and the remainder are explored highest-probability-first, so that plausible reconstructions are identified immediately.

这篇关于恢复损坏的zip或gzip文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆