如何强制XPath使用UTF8? [英] How to force XPath to use UTF8?

查看:189
本文介绍了如何强制XPath使用UTF8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XHTML文档通过Greasemonkey AJAX传递给PHP应用程序. PHP应用程序使用UTF8.如果我将POST内容直接输出回AJAX接收div中的textarea,则所有内容仍正确编码为UTF8.

I have an XHTML document being passed to a PHP app via Greasemonkey AJAX. The PHP app uses UTF8. If I output the POST content straight back to a textarea in the AJAX receiving div, everything is still properly encoded in UTF8.

当我尝试使用XPath进行解析

When I try to parse using XPath

$dom = new DOMDocument();
$dom->loadHTML($raw2);
$xpath = new DOMXPath($dom);
$query = '//td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {
  var_dump($node->wholeText);
}

转储的字符串不是utf8.如何强制DOM/XPath使用UTF8?

dumped strings are not utf8. How do I force DOM/XPath to use UTF8?

推荐答案

如果它是完全有效的有效xhtml文档,则不应使用loadhtml(),而应使用load()/loadxml().

If it is a fully fledged valid xhtml document you shouldn't use loadhtml() but load()/loadxml().

给出示例xhtml文档

Given the example xhtml document

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <title>xhtml test</title>
    </head>
    <body>
        <h1>A Table</h1>
        <table>
            <tr><th>A</th><th>O</th><th>U</th></tr>
            <tr><td>Ä</td><td>Ö</td><td>Ü</td></tr>
            <tr><td>ä</td><td>ö</td><td>ü</td></tr>
        </table>
    </body>
</html>

脚本

<?php
$raw2 = 'test.html';

$dom = new DOMDocument();
$dom->load($raw2);
$xpath = new DOMXPath($dom);
var_dump($xpath->registerNamespace('h', 'http://www.w3.org/1999/xhtml'));
$query = '//h:td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {
    foo($node->wholeText);
}


function foo($s) {
    for($i=0; $i<strlen($s); $i++) {
        printf('%02X ', ord($s[$i]));
    }
    echo "\n";
}

打印

bool(true)
C3 84 
C3 96 
C3 9C 
C3 A4 
C3 B6 
C3 BC 

即输出/字符串是utf-8编码的

i.e. the output/strings are utf-8 encoded

这篇关于如何强制XPath使用UTF8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆