DOMDocument和windows-1250编码 [英] DOMDocument and windows-1250 encoding

查看:568
本文介绍了DOMDocument和windows-1250编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在编写应该解析不同网站的代码,其中一些使用 windows-1250 编码,其中一些使用'utf- 8' 。我对这些网站没有任何影响,您可以猜测那些Windows-1250的页面让我头疼。所以,这是我使用的代码:

  $ doc = new DOMDocument(); 
@ $ doc-> loadHTML($ response);

$ xpath = new DOMXpath($ doc);
$ anchors = $ xpath-> query(// a [@href]);
foreach($ anchors as $ anchor){
$ href = $ anchor-> getAttribute(href);
$ anchor-> setAttribute(href,http://example.com/);
}

$ response = $ xpath-> document-> saveHTML();

当我尝试运行此脚本时,这里是浏览器中的输出:

 警告:DOMDocument :: saveHTML():由于转换错误,输出转换失败,字节0x9A 0x61 0x72 0x6B 
那么,有没有办法用'windows-1250'编码来处理这个错误,那么这个工作也可以在utf-8上工作吗?我尝试使用 utf_encode $ response ,然后通过,但国际字符被搞砸了。

解决方案

如果您只是想更改所有锚标签的href,那么您可以使用jquery



代码将如下所示:

  //循环遍历锚标签
$( a)。each(function(){//开始每个函数

//设置href属性
$(this).attr(href,http:// example.com/);


}); //结束每个函数

这是一个jsfiddle示例: http://jsfiddle.net/fu5fxawm/1/



如果您将鼠标悬停在链接上,您将看到它们已被更改。


So, I'm writing the code that is supposed to parse different websites, and some of them use windows-1250 encoding, and some of them use 'utf-8'. I don't have any impact over those websites, and you can probably guess that those pages with 'windows-1250' are giving me headache. So, here's the code that I'm using:

    $doc = new DOMDocument();
        @$doc->loadHTML($response);

        $xpath = new DOMXpath($doc);
        $anchors = $xpath->query("//a[@href]");
        foreach( $anchors as $anchor) {
            $href = $anchor->getAttribute("href");
            $anchor->setAttribute("href", 'http://example.com/');
        }

        $response = $xpath->document->saveHTML();

and here's the output in browser when I try to run this script:

Warning: DOMDocument::saveHTML(): output conversion failed due to conv error, bytes 0x9A 0x61 0x72 0x6B

So, is there a way to handle this error with 'windows-1250' encoding, that will work work utf-8 also ? I tried using utf_encode with $response and that passes, but then international characters are messed up.

解决方案

if you are just trying to change the href of all of your anchor tags then you could just use jquery

The code would look like this:

  //loop through the anchor tags
 $("a").each(function(){//begin each function

  //set the href attributes
  $(this).attr("href","http://example.com/");


  });//end each function

Here is a jsfiddle example: http://jsfiddle.net/fu5fxawm/1/

If you hover over the links you will see that they have been changed.

这篇关于DOMDocument和windows-1250编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆