使用php将已解析的文本转换为utf-8 [英] Convert parsed text, with php, to utf-8

查看:68
本文介绍了使用php将已解析的文本转换为utf-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

除了我之前关于解析图像和文本的问题从复杂的xml 中,现在唯一的问题是我没有得到正确的编码.文本为希腊文,xml文件具有utf-8编码. 这是解析xml的代码:

In addition to my previous question about parsing images and text from complex xml, only problem about that now is that i don't get the right encoding. Text is in greek, the xml file has utf-8 encoding. This is the code to parse xml:

$xml = simplexml_load_file('myfile.xml');

$descriptions = $xml->xpath('//item/description');

foreach ( $descriptions as $description_node ) {

    $description_dom = new DOMDocument();
    $description_dom->loadHTML( (string)$description_node );

    $description_sxml = simplexml_import_dom( $description_dom );

    $imgs = $description_sxml->xpath('//img');
    $text = $description_sxml->xpath('//div');

    foreach($imgs as $image){

    echo (string)$image['src'];     
       }

    foreach($text as $t){
    
        echo (string)$t;
       }
    }

如果我echo $description_node,文本看起来很好,但是在我得到$description_domsimplexml_import_dom之后,它看起来像这样: Ïε ιÏÎ»Î±Î¼Î¹ÎºÎ­Ï ÎºÎ¿Î¹Î½ÏÏηÏεÏ.使用mb_convert_encoding会将其变为: ýÃÂñù" ÃÂ.我在做什么错了?

If i echo $description_node,text looks fine, but after i get $description_dom with simplexml_import_domit looks like this: Ïε ιÏÎ»Î±Î¼Î¹ÎºÎ­Ï ÎºÎ¿Î¹Î½ÏÏηÏεÏ.Using mb_convert_encoding turns it to: ýÃÂñù" ÃÂ. What am i doing wrong?

推荐答案

解决方案:在$description_dom = new DOMDocument();之后,我放置了这段代码.

Solution: after $description_dom = new DOMDocument(); , i placed this code.

$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");

简单地将html entities转换为UTF-8.代替

$description_dom->loadHTML( (string)$description_node );

现在我加载转换后的html

now i load the converted html

$description_dom->loadHTML( (string)$description_html );

这篇关于使用php将已解析的文本转换为utf-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆