从XML读取德语文本并写入PDF [英] Read German text from XML and write to a PDF
问题描述
我有一个XML(使用UTF-8).我必须使用 PugiXML 库从其中读取std::string
变量的值.读取值后,我将其打印在控制台上,但是在我的实际项目中,我必须将该值放入PDF(使用LibHaru库).我的 MWE 如下:
I have an XML (in UTF-8). I have to read a value of a std::string
variable from it using PugiXML libraries. After reading the value, I am printing it on console but in my actual project, I have to put that value to a PDF (using LibHaru libraries). My MWE is following:
#include <iostream>
#include "pugiconfig.hpp"
#include "pugixml.hpp"
using namespace pugi;
int main()
{
pugi::xml_document doc;
pugi::xml_parse_result result = doc.load_file(FILEPATH);
xml_node root_node = doc.child("Report");
xml_node SystemName_node = root_node.child("SystemName");
std::string strSystemName = SystemName_node.child_value();
std::cout<<" The name of the system is: "<<strSystemName<<std::endl;
return 0;
}
我正在使用 Pugixml 库从XML文件读取变量std::string strSystemName
的值.读取变量后,我将其打印在屏幕上(在我的实际项目中,我将其写入pdf文件). 问题: :在调试过程中,我发现已经从XML文件(该文件已经存在于UTF-8中)读取了奇怪的字符,如果我将变量打印在屏幕或将其放入pdf.
I am reading the value of a variable std::string strSystemName
from a XML file using Pugixml libraries. After reading the variable I am printing it on screen (in my actual project, I am writing it to a pdf file). Problem: During debugging, I found that the strange characters have been read from the XML file (which is already in UTF-8), which appears if I print the variable on screen or put it to the pdf.
重要:打印到控制台并不是太重要.重要的是将其正确放置在同样使用UTF-8编码的PDF文件中.但是我认为将变量存储在std::string
中会造成某种问题,因此将wrone值传递给PDF编写器.
IMPORTANT: Printing to console is not too important. Important is to put it properly to the PDF file which is also using UTF-8 encoding. But I think that storing the variable in std::string
is somehow creating problem and therefore the wrone value is passed to the PDF writer.
PS::我正在使用没有C ++ 11的 VS2010 .
PS: I am using VS2010 which is without C++11.
推荐答案
这里的问题是std::cout
只是将字符串中的UTF-8字节反映到控制台.通常,在Windows上,该控制台不是在UTF-8中运行,而是在(例如)代码页1252中运行,因此UTF-8'ä`的两个字节显示为两个字符.
The problem here is that std::cout
is just reflecting the UTF-8 bytes in the string to the console. Normally on Windows, the console is not running in UTF-8, but in (for example) code page 1252, so the two bytes of a UTF-8 'ä` get displayed as two characters.
您的解决方案是将控制台转换为UTF-8(请参见此答案),或者将您的UTF转换为将-8字符串转换为CP-1252字符串.我认为这将需要MultiByteToWideChar
(指定UTF-8)+ WideCharToMultiByte
(指定CP-1252)
Your solution is either to convert the console to UTF-8 (see this answer), or to convert your UTF-8 string into a CP-1252 string. I think this is going to require MultiByteToWideChar
(specifying UTF-8) + WideCharToMultiByte
(specifying CP-1252)
要调试您的 actual 问题(将UTF-8字符串传递到pugixml中),您需要查看字符串中的实际字节,并检查它们是否符合您的想法.
To debug your actual problem (passing UTF-8 strings into pugixml), you need to look at the actual bytes in the strings, and check they are what you think they are.
这篇关于从XML读取德语文本并写入PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!