XML的最佳压缩算法? [英] Best compression algorithm for XML?

查看:175
本文介绍了XML的最佳压缩算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我几乎不知道压缩的事情,所以忍受我(这可能是一个愚蠢和痛苦明显的问题)。



所以让我说一个XML文件有几个标签。

 < verylongtagnumberone> 
< verylongtagnumbertwo>
text
< / verylongtagnumbertwo>
< / verylongtagnumberone>

现在让我说我有一堆很长的标签,在我的多个XML文件中有很多属性。我需要将它们压缩到尽可能小的尺寸。最好的方法是使用一个XML特定的算法,分配单个标签假名,如vlt1或vlt2。然而,这不会像我试图去的那样开放,我想使用DEFLATE或LZ这样的通用算法。



因为我处理纯文本(没有二进制文件像图像),我想要一个算法适合纯文本。哪一个产生最小的文件大小(无损算法是首选)?



顺便说一下,方案是这样的:我正在创建一个文档标准,如ODF或MS Office XML,包含XML文件,打包成.zip。



编辑:加密事情是一个错字;

解决方案

有一个W3(尚未发布)的标准命名为 EXI(高效XML交换)



应成为将来压缩XML数据的数据格式(声称是最后一个必要的二进制格式)。对于XML进行优化,它比任何常规的压缩算法更有效地压缩XML。



使用EXI,您可以即时操作压缩的XML数据解压缩或重新压缩它)。



EXI =(XML + XMLSchema)这里你使用opensource实现(不知道它是否已经稳定):

明确


I barely know a thing about compression, so bear with me (this is probably a stupid and painfully obvious question).

So lets say I have an XML file with a few tags.

<verylongtagnumberone>
  <verylongtagnumbertwo>
    text
  </verylongtagnumbertwo>
</verylongtagnumberone>

Now lets say I have a bunch of these very long tags with many attributes in my multiple XML files. I need to compress them to the smallest size possible. The best way would be to use an XML-specific algorithm which assigns individual tags pseudonyms like vlt1 or vlt2. However, this wouldn't be as 'open' of a way as I m trying to go for, and I want to use a common algorithm like DEFLATE or LZ. It also helpes if the archive was a .zip file.

Since I'm dealing with plain text (no binary files like images), I'd like an algorithm that suits plain text. Which one produces the smallest file size (lossless algorithms are preferred)?

By the way, the scenario is this: I am creating a standard for documents, like ODF or MS Office XML, that contain XML files, packaged in a .zip.

EDIT: The 'encryption' thing was a typo; it should ave ben 'compression'.

解决方案

There is a W3 (not-yet-released) standard named EXI (Efficient XML Interchange).

Should become THE data format for compressing XML data in the future (claimed to be the last necessary binary format). Being optimized for XML, it compresses XML more ways more efficient than any conventional compression algorithm.

With EXI, you can operate on compressed XML data on the fly (without the need to uncompress or re-compress it).

EXI = (XML + XMLSchema) as binary.

And here you go with the opensource implementation (don't know if it's already stable):
Exificient

这篇关于XML的最佳压缩算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆