内部的libxml和输出编码 [英] LibXML internal and output encodings
问题描述
我试图写在ISO-8859-1 libxml2的XML文件。
但是从文档似乎为我创造我将不得不转换为UTF-8这是的libxml的内部编码每个文本节点。然后,当调用xmlSaveFormatFileEnc()的libxml转换为目标编码,并将该编码属性的文档
I'm trying to write XML files with libxml2 in ISO-8859-1. But from the documentation it seems that for each text node that I create I'll have to convert to UTF-8 which is libxml's internal encoding. Then when calling xmlSaveFormatFileEnc() libxml converts to the target encoding and adds the encoding attribute to the document.
这是假设是正确的?
现在我的code去大致是这样的:
Is this assumption correct? For now my code goes roughly like this:
的xmlNode * root_element = NULL,*节点4 = NULL;
xmlDoc中* DOC = NULL;
xmlNode *root_element = NULL, *node4 = NULL;
xmlDoc *doc = NULL;
doc = xmlNewDoc(BAD_CAST XML_DEFAULT_VERSION);
root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"),
NULL);
char * input_str = getLatin1Data();
isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen);
node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str));
xmlAddChild(root_element, node4);
xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1);
xmlFreeDoc(doc);
推荐答案
您的假设是正确的。当 XMLCHAR
预计,像 xmlNewCDataBlock
, xmlNewText
,它始终是UTF-8:
Your assumption is right. When xmlChar
is expected, like in xmlNewCDataBlock
, xmlNewText
, it is always UTF-8:
从包含/的libxml / xmlstring.h
(libxml的2.8.0):
From include/libxml/xmlstring.h
(libxml 2.8.0):
/**
* xmlChar:
*
* This is a basic byte in an UTF-8 encoded string.
* It's unsigned allowing to pinpoint case where char * are assigned
* to xmlChar * (possibly making serialization back impossible).
*/
这篇关于内部的libxml和输出编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!