使用php将已解析的文本转换为utf-8 [英] Convert parsed text, with php, to utf-8
问题描述
除了我之前关于解析图像和文本的问题从复杂的xml 中,现在唯一的问题是我没有得到正确的编码.文本为希腊文,xml
文件具有utf-8
编码.
这是解析xml的代码:
In addition to my previous question about parsing images and text from complex xml, only problem about that now is that i don't get the right encoding. Text is in greek, the xml
file has utf-8
encoding.
This is the code to parse xml:
$xml = simplexml_load_file('myfile.xml');
$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
$description_sxml = simplexml_import_dom( $description_dom );
$imgs = $description_sxml->xpath('//img');
$text = $description_sxml->xpath('//div');
foreach($imgs as $image){
echo (string)$image['src'];
}
foreach($text as $t){
echo (string)$t;
}
}
如果我echo $description_node
,文本看起来很好,但是在我得到$description_dom
和simplexml_import_dom
之后,它看起来像这样:
Ïε ιÏλαμικÎÏ ÎºÎ¿Î¹Î½ÏÏηÏεÏ.
使用mb_convert_encoding
会将其变为:
ýÃÂñù" ÃÂ
.我在做什么错了?
If i echo $description_node
,text looks fine, but after i get $description_dom
with simplexml_import_dom
it looks like this:
Ïε ιÏλαμικÎÏ ÎºÎ¿Î¹Î½ÏÏηÏεÏ.
Using mb_convert_encoding
turns it to:
ýÃÂñù" ÃÂ
. What am i doing wrong?
推荐答案
解决方案:在$description_dom = new DOMDocument();
之后,我放置了这段代码.
Solution: after $description_dom = new DOMDocument();
, i placed this code.
$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");
简单地将html entities
转换为UTF-8
.代替
$description_dom->loadHTML( (string)$description_node );
现在我加载转换后的html
now i load the converted html
$description_dom->loadHTML( (string)$description_html );
这篇关于使用php将已解析的文本转换为utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!