存在“二进制转储”或“获得二进制表示”函数在LibXML2? [英] There are a "binary dump" or "get binary representation" function in LibXML2?
问题描述
我需要访问加载的XML DOM的内部二进制表示法... 有一些转储函数,但我没有看到像二进制缓冲区(只有XML缓冲区)。
我的最后一个目标是比较byte-by (当前和缓存)表示,直接使用它们的二进制(当前和缓存)表示,而不转换(到XML文本表示)的同一文档 ...所以,问题,
在LibXML2中有一个二进制表示(内存结构),比较 dump和current <
$ b >
详情
这不是比较两个不同的DOM对象,但是一些更容易,因为没有更改ID等,不需要规范表示(!),只需要访问内部表示,因为是非常快的转换成文本。
前后之间有 black-box procedure ,ex。影响(或不影响)某些节点或属性的 XSLT标识转换。
替代解决方案...
-
。为LibXML2开发一个C函数,用于比较两个树的逐个节点,并且如果它们不同则返回false:在树遍历期间,如果树结构改变或一些nodeValue改变,则算法停止比较(返回false )
-
...不是理想的,但是有助于其他一些算法:如果我可以访问(在LibXML2中) / em>或总长或大小或 md5 或 sha1 ...仅优化频繁对于我的应用程序),其中比较将返回false,避免完整的比较过程。
注意
相关问题
问题是比较之前与后箱操作之前,但
: 或使用已知的库。您必须知道,您的黑盒不会更改属性顺序或ID内容或非正规化空格(等)。我将在知名黑盒的上下文中使用解决方案。所以,我在上面的详细信息部分的评论是有效的。
在全免费背景的上下文中,不能使用比较二进制转储,因为只有规范表示(C14N)才能进行比较。为了通过C14N标准进行比较,只有备选解决方案(以上注释)是可能的。对于备选-1,您必须(除其他外)在比较一组属性节点之前进行排序。对于备用2(此处也讨论),生成C14N转储。
当然,使用C14N标准是主观的,取决于应用:if,p。例如,对于你的应用程序,change attribute order是一个有效/重要的更改,比较检测它是有效的(!)。
这里是相关的libxml2方法:
有一个 base64 编码方法:
功能:xmlTextWriterWriteBase64
int xmlTextWriterWriteBase64 ,
const char * data,
int start,
int len)
编写一个base64编码的xml文本。
writer:xmlTextWriterPtr
data:binary data
start:要编码的第一个字节的数据内的位置
len:要编码的字节数
返回:写入的字节(可能由于缓冲而为0)或在出现错误时为-1
和 BinHex 编码方法:
功能: xmlTextWriterWriteBinHex
int xmlTextWriterWriteBinHex(xmlTextWriterPtr writer,
const char * data,
int start,
int len)
编写一个BinHex编码的xml文本。
writer:xmlTextWriterPtr
data:binary data
start:要编码的第一个字节的数据内的位置
len:要编码的字节数
返回:写入的字节(可能由于缓冲而为0)或者出现错误时为-1
参考 / p>
I need to access the internal binary representation of a loaded XML DOM... There are some dump functions, but I not see something like "binary buffer" (there are only "XML buffers").
My last objective is to compare byte-by-byte, the same document, before and after some black-box procedure, directly with their binary (current and cached) representations, without convertion (to XML-text representation)... So, the question,
There are a binary representation (in-memory structures) in LibXML2, to compare dump with current representations?
I need only to check if current and dumped DOMs are equivalent.
Details
It is not a problem of comparing two distinct DOM objects, but something more easy, because not change IDs, etc. not need canonical representation (!), only need access to internal representation, because is very faster than convert to text.
Between "before and after" there are a black-box procedure, ex. a XSLT Identity transform that affects (or not) some nodes or attributes.
Alternative solution...
... To develop a C function for LibXML2 that compares node-by-node the two trees, and return false if they are different: during the tree traversal, if tree structure changes, or some nodeValue changes, the algorithm stops the comparison (returning false).
... Not the ideal, but helps some other algorithms: if I can access (in LibXML2) the total number of nodes or the total length or size or md5 or sha1... Only to optimize frequent cases (for my application) where the comparison will returns false, avoiding the complete comparison-procedure.
NOTES
Related questions
- How to check if a DomDocument was changed with a simple and fast comparison?
- C byte-by-byte comparison
- libxml xmlNodePtr to raw xml string?
Warning for reader using answered solutions
The problem is about "to compare before with after a back-box operation", but there are two kinds of back-boxes here:
- Well-known and controllable ones, like XSLT transforms or use of a known library. You must known that your black-boxes will not change attribute order or ID content or denormalize spaces (or etc.).
- Full-free ones, like use of a external editor (ex. online-editor changing a XHTML), where user and software can do anything.
I will use a solution in a context of "well-known" black-box. So, my comments at "Details" section above, are valid.
In a context of "full-free" back-boxes, you can not to use a "comparison of binary dumps", because only a canonical representation (C14N) is valid to compare. To compare by C14N-criteria, only "Alternative solutions" (commented above) are possible. For alternative-1, you must, among other things, sort before compare a set of attribute-nodes. For alternative-2 (also discussed here), to generate the C14N dumps.
PS: of course, use of the C14N criteria is subjective, depends on application: if, p. ex., for your appication "change attribute order" is a valid/important change, the comparasion that detects it is valid (!).
Here are the relevant libxml2 methods:
There is a base64 encoding method:
Function: xmlTextWriterWriteBase64 int xmlTextWriterWriteBase64 (xmlTextWriterPtr writer, const char * data, int start, int len) Write an base64 encoded xml text. writer: the xmlTextWriterPtr data: binary data start: the position within the data of the first byte to encode len: the number of bytes to encode Returns: the bytes written (may be 0 because of buffering) or -1 in case of error
and a BinHex encoding method:
Function: xmlTextWriterWriteBinHex int xmlTextWriterWriteBinHex (xmlTextWriterPtr writer, const char * data, int start, int len) Write a BinHex encoded xml text. writer: the xmlTextWriterPtr data: binary data start: the position within the data of the first byte to encode len: the number of bytes to encode Returns: the bytes written (may be 0 because of buffering) or -1 in case of error
References
这篇关于存在“二进制转储”或“获得二进制表示”函数在LibXML2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!