XML 的最佳压缩算法? [英] Best compression algorithm for XML?

查看:13
本文介绍了XML 的最佳压缩算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对压缩几乎一无所知,所以请耐心等待(这可能是一个愚蠢且显而易见的问题).

I barely know a thing about compression, so bear with me (this is probably a stupid and painfully obvious question).

假设我有一个带有几个标签的 XML 文件.

So lets say I have an XML file with a few tags.

<verylongtagnumberone>
  <verylongtagnumbertwo>
    text
  </verylongtagnumbertwo>
</verylongtagnumberone>

现在假设我的多个 XML 文件中有一堆这些非常长的标签,其中包含许多属性.我需要将它们压缩到尽可能小的尺寸.最好的方法是使用特定于 XML 的算法,该算法分配单个标签假名,如 vlt1 或 vlt2.然而,这不会像我想要的那样开放",我想使用像 DEFLATE 或 LZ 这样的通用算法.如果存档是 .zip 文件,这也有帮助.

Now lets say I have a bunch of these very long tags with many attributes in my multiple XML files. I need to compress them to the smallest size possible. The best way would be to use an XML-specific algorithm which assigns individual tags pseudonyms like vlt1 or vlt2. However, this wouldn't be as 'open' of a way as I m trying to go for, and I want to use a common algorithm like DEFLATE or LZ. It also helpes if the archive was a .zip file.

因为我处理的是纯文本(没有像图像这样的二进制文件),所以我想要一种适合纯文本的算法.哪个产生的文件大小最小(首选无损算法)?

Since I'm dealing with plain text (no binary files like images), I'd like an algorithm that suits plain text. Which one produces the smallest file size (lossless algorithms are preferred)?

顺便说一下,场景是这样的:我正在为包含 XML 文件的文档(如 ODF 或 MS Office XML)创建一个标准,这些文件打包在 .zip 中.

By the way, the scenario is this: I am creating a standard for documents, like ODF or MS Office XML, that contain XML files, packaged in a .zip.

加密"的东西是一个错字;它应该是'压缩'.

The 'encryption' thing was a typo; it should ave ben 'compression'.

推荐答案

有一个 W3(尚未发布)标准,名为 EXI(高效 XML 交换).

There is a W3 (not-yet-released) standard named EXI (Efficient XML Interchange).

将来应该成为压缩 XML 数据的数据格式(声称是最后一个必需的二进制格式).针对 XML 进行了优化,它以比任何传统压缩算法更有效的方式压缩 XML.

Should become THE data format for compressing XML data in the future (claimed to be the last necessary binary format). Being optimized for XML, it compresses XML more ways more efficient than any conventional compression algorithm.

使用 EXI,您可以即时对压缩的 XML 数据进行操作(无需解压缩或重新压缩).

With EXI, you can operate on compressed XML data on the fly (without the need to uncompress or re-compress it).

EXI = (XML + XMLSchema) 作为二进制.

EXI = (XML + XMLSchema) as binary.

这里是开源实现(不知道它是否已经稳定):
优秀

And here you go with the opensource implementation (don't know if it's already stable):
Exificient

这篇关于XML 的最佳压缩算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆