有效压缩具有许多相同文件的文件系统目录树 [英] Efficient compression of a file system directory tree with many identical files

查看:37
本文介绍了有效压缩具有许多相同文件的文件系统目录树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有多个 .NET Web 应用程序,它们都共享相当多的公共库.他们都不在 GAC 中.

We have multiple .NET web applications all sharing quite a few common libraries. None of them are in the GAC.

部署限制是所有这些 Web 应用程序都具有专用目录.导致整个目录结构中有大量重复的dll.

The deployment constraint is that all of these web applications have dedicated directories. Which results in large amount of duplicated dlls in the total directory structure.

此目录结构是从单个 zip 存档中提取的.

This directory structure is extracted from a single zip archive.

因此 zip 存档在不同的目录中有许多相同的文件.

As a result the zip archive has many identical files found in different directories.

这是巨大的冗余,我想在 zip 存档中消除它,我不太关心磁盘上是否创建了冗余文件.我看到两种优化 zip 的方法:

This is huge redundancy, which I want to eliminate in the zip archive, I do not care much if redundant files are created on the disk. I see two ways optimize the zip:

  1. 使用 Windows 符号链接和联结来减少物理相同文件的数量.
  2. 使用不会将同一文件数据压缩两次的智能压缩.

方法一

我使用 zip 和 7z 来测试压缩目录结构.我使用联结和文件符号链接作为减少磁盘空间的手段.

I used zip and 7z to test compressing directory structures. I used junctions and file symbolic links as the means to reduce space on disk.

不幸的是,zip 和 7z 都像压缩目录一样压缩连接.符号链接被 7z 压缩为零长度文件,解压缩后其作为符号链接的性质将丢失.zip 会遍历符号链接并压缩目标数据,这会导致存档中出现重复的文件内容.

Unfortunately, both zip and 7z compress junctions as if they were full blown directories. A symbolic link is compressed as a zero length file by 7z, its nature as a symbolic link is lost upon decompression. zip traverses the symbolic link and compresses the target data instead, which results in duplicate file content in the archive.

简而言之,我没有使用第一种方法消除重复的文件数据.

In short I failed to eliminate the duplicate file data using the first method.

方法二

我想要的是http://sourceforge.net/p/Sevenzip/feature-requests/794/.然而,这只不过是一个功能请求.

What I want is exactly described by http://sourceforge.net/p/sevenzip/feature-requests/794/. However, it is nothing more than a feature request.

对功能请求的评论提到 lrzip 作为一种高效的大文件压缩器.我必须检查它,但它似乎并没有像我希望的那样消除重复的文件数据.

A comment to the feature request mentions lrzip as an efficient huge file compressor. I have to check it, but it does not seem to eliminate duplicate file data the way I would like it to be.

欢迎任何帮助.

推荐答案

mark,你是如何尝试 lrzip 的?它无法检测压缩存档中的重复项(默认 zip);它应该与一些非压缩档案(在 Unix 世界中 - 使用 tar)或未压缩创建的 zipfile 一起使用(您将获得大小几乎等于输入大小总和的档案).

mark, how did you try lrzip? It can't detect duplicates inside compressed archive (default zip); it should be used with some non-compressing archive (in Unix world - with tar) or zipfile created without compression (you will get archive with the size almost equal to sum of input sizes).

您也可以尝试任何能够实体模式的多文件压缩器(rar, 7z),但是如果您的存档很大并且重复项之间的距离很大,这可能不起作用.lrzip 支持更远的距离.

You can also try any multi-file compressor, capable of solid mode (rar, 7z), but this may not work if your archive is huge and there is big distance between duplicates. lrzip supports greater distance.

Unix 上的 Tar(和 PAX)支持硬链接和软链接:http://www.gnu.org/software/tar/manual/html_section/tar_71.html#SEC140

Tar (and PAX) on Unix supports hard and soft links: http://www.gnu.org/software/tar/manual/html_section/tar_71.html#SEC140

这篇关于有效压缩具有许多相同文件的文件系统目录树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆